Introduction to Symbol Grounding
submitted by /u/Neurosymbolic
[link] [comments]
( 1
min )
The recent upheavals at OpenAI and OpenAI’s Chief Scientist’s apprehensions regarding the “safety” of AI have ignited a fresh wave of concerns and fears about the march towards Artificial General Intelligence (AGI) and “Super Intelligence.” AI safety concerns the development of AI systems aligned with human values and do not cause harm to humans. Some… Read More »A Different AI Scenario: AI and Justice in a Brave New World – Part 1
The post A Different AI Scenario: AI and Justice in a Brave New World – Part 1 appeared first on Data Science Central.
( 22
min )
Climate hazards can cause major disasters when they occur simultaneously as
compound hazards. To understand the distribution of climate risk and inform
adaptation policies, scientists need to simulate a large number of physically
realistic and spatially coherent events. Current methods are limited by
computational constraints and the probabilistic spatial distribution of
compound events is not given sufficient attention. The bottleneck in current
approaches lies in modelling the dependence structure between variables, as
inference on parametric models suffers from the curse of dimensionality.
Generative adversarial networks (GANs) are well-suited to such a problem due to
their ability to implicitly learn the distribution of data in high-dimensional
settings. We employ a GAN to model the dependence structure for daily maximum
wind speed, significant wave height, and total precipitation over the Bay of
Bengal, combining this with traditional extreme value theory for controlled
extrapolation of the tails. Once trained, the model can be used to efficiently
generate thousands of realistic compound hazard events, which can inform
climate risk assessments for climate adaptation and disaster preparedness. The
method developed is flexible and transferable to other multivariate and spatial
climate datasets.
( 2
min )
Inference of community structure in probabilistic graphical models may not be
consistent with fairness constraints when nodes have demographic attributes.
Certain demographics may be over-represented in some detected communities and
under-represented in others. This paper defines a novel $\ell_1$-regularized
pseudo-likelihood approach for fair graphical model selection. In particular,
we assume there is some community or clustering structure in the true
underlying graph, and we seek to learn a sparse undirected graph and its
communities from the data such that demographic groups are fairly represented
within the communities. In the case when the graph is known a priori, we
provide a convex semidefinite programming approach for fair community
detection. We establish the statistical consistency of the proposed method for
both a Gaussian graphical model and an Ising model for, respectively,
continuous and binary data, proving that our method can recover the graphs and
their fair communities with high probability.
( 2
min )
Analyzing large-scale time-series network data, such as social media and
email communications, poses a significant challenge in understanding social
dynamics, detecting anomalies, and predicting trends. In particular, the
scalability of graph analysis is a critical hurdle impeding progress in
large-scale downstream inference. To address this challenge, we introduce a
temporal encoder embedding method. This approach leverages ground-truth or
estimated vertex labels, enabling an efficient embedding of large-scale graph
data and the processing of billions of edges within minutes. Furthermore, this
embedding unveils a temporal dynamic statistic capable of detecting
communication pattern shifts across all levels, ranging from individual
vertices to vertex communities and the overall graph structure. We provide
theoretical support to confirm its soundness under random graph models, and
demonstrate its numerical advantages in capturing evolving communities and
identifying outliers. Finally, we showcase the practical application of our
approach by analyzing an anonymized time-series communication network from a
large organization spanning 2019-2020, enabling us to assess the impact of
Covid-19 on workplace communication patterns.
( 3
min )
This paper studies the one-shot behavior of no-regret algorithms for
stochastic bandits. Although many algorithms are known to be asymptotically
optimal with respect to the expected regret, over a single run, their
pseudo-regret seems to follow one of two tendencies: it is either smooth or
bumpy. To measure this tendency, we introduce a new notion: the sliding regret,
that measures the worst pseudo-regret over a time-window of fixed length
sliding to infinity. We show that randomized methods (e.g. Thompson Sampling
and MED) have optimal sliding regret, while index policies, although possibly
asymptotically optimal for the expected regret, have the worst possible sliding
regret under regularity conditions on their index (e.g. UCB, UCB-V, KL-UCB,
MOSS, IMED etc.). We further analyze the average bumpiness of the pseudo-regret
of index policies via the regret of exploration, that we show to be suboptimal
as well.
( 2
min )
Lipschitz continuity is a crucial functional property of any predictive
model, that naturally governs its robustness, generalisation, as well as
adversarial vulnerability. Contrary to other works that focus on obtaining
tighter bounds and developing different practical strategies to enforce certain
Lipschitz properties, we aim to thoroughly examine and characterise the
Lipschitz behaviour of Neural Networks. Thus, we carry out an empirical
investigation in a range of different settings (namely, architectures,
datasets, label noise, and more) by exhausting the limits of the simplest and
the most general lower and upper bounds. As a highlight of this investigation,
we showcase a remarkable fidelity of the lower Lipschitz bound, identify a
striking Double Descent trend in both upper and lower bounds to the Lipschitz
and explain the intriguing effects of label noise on function smoothness and
generalisation.
( 2
min )
The Fermat distance has been recently established as a useful tool for
machine learning tasks when a natural distance is not directly available to the
practitioner or to improve the results given by Euclidean distances by
exploding the geometrical and statistical properties of the dataset. This
distance depends on a parameter $\alpha$ that greatly impacts the performance
of subsequent tasks. Ideally, the value of $\alpha$ should be large enough to
navigate the geometric intricacies inherent to the problem. At the same, it
should remain restrained enough to sidestep any deleterious ramifications
stemming from noise during the process of distance estimation. We study both
theoretically and through simulations how to select this parameter.
( 2
min )
Virtually all machine learning tasks are characterized using some form of
loss function, and "good performance" is typically stated in terms of a
sufficiently small average loss, taken over the random draw of test data. While
optimizing for performance on average is intuitive, convenient to analyze in
theory, and easy to implement in practice, such a choice brings about
trade-offs. In this work, we survey and introduce a wide variety of
non-traditional criteria used to design and evaluate machine learning
algorithms, place the classical paradigm within the proper historical context,
and propose a view of learning problems which emphasizes the question of "what
makes for a desirable loss distribution?" in place of tacit use of the expected
loss.
( 2
min )
This paper presents a comprehensive comparative analysis of the performance
of Equivariant Quantum Neural Networks (EQNN) and Quantum Neural Networks
(QNN), juxtaposed against their classical counterparts: Equivariant Neural
Networks (ENN) and Deep Neural Networks (DNN). We evaluate the performance of
each network with two toy examples for a binary classification task, focusing
on model complexity (measured by the number of parameters) and the size of the
training data set. Our results show that the $\mathbb{Z}_2\times \mathbb{Z}_2$
EQNN and the QNN provide superior performance for smaller parameter sets and
modest training data samples.
( 2
min )
The ability to quickly build and deploy machine learning (ML) models is becoming increasingly important in today’s data-driven world. However, building ML models requires significant time, effort, and specialized expertise. From data collection and cleaning to feature engineering, model building, tuning, and deployment, ML projects often take months for developers to complete. And experienced data […]
( 10
min )
Launched in 2019, Amazon SageMaker Studio provides one place for all end-to-end machine learning (ML) workflows, from data preparation, building and experimentation, training, hosting, and monitoring. As we continue to innovate to increase data science productivity, we’re excited to announce the improved SageMaker Studio experience, which allows users to select the managed Integrated Development Environment (IDE) […]
( 6
min )
As organizations scale the adoption of machine learning (ML), they are looking for efficient and reliable ways to deploy new infrastructure and onboard teams to ML environments. One of the challenges is setting up authentication and fine-grained permissions for users based on their roles and activities. For example, MLOps engineers typically perform model deployment activities, […]
( 8
min )
PwR uses domain-specific languages to bridge communication between developers and AI tools. Learn how it can help simplify code creation and enhance software reliability and customization, no matter your coding expertise.
The post PwR: Using representations for AI-powered software development appeared first on Microsoft Research.
( 10
min )
Concept erasure in text-to-image diffusion models aims to disable pre-trained
diffusion models from generating images related to a target concept. To perform
reliable concept erasure, the properties of robustness and locality are
desirable. The former refrains the model from producing images associated with
the target concept for any paraphrased or learned prompts, while the latter
preserves the model ability in generating images for non-target concepts. In
this paper, we propose Reliable Concept Erasing via Lightweight Erasers
(Receler), which learns a lightweight Eraser to perform concept erasing and
enhances locality and robustness with the proposed concept-localized
regularization and adversarial prompt learning, respectively. Comprehensive
quantitative and qualitative experiments with various concept prompts verify
the superiority of Receler over the previous erasing methods on the above two
desirable properties.
( 2
min )
Multivariate time series have many applications, from healthcare and
meteorology to life science. Although deep learning models have shown excellent
predictive performance for time series, they have been criticised for being
"black-boxes" or non-interpretable. This paper proposes a novel modular neural
network model for multivariate time series prediction that is interpretable by
construction. A recurrent neural network learns the temporal dependencies in
the data while an attention-based feature selection component selects the most
relevant features and suppresses redundant features used in the learning of the
temporal dependencies. A modular deep network is trained from the selected
features independently to show the users how features influence outcomes,
making the model interpretable. Experimental results show that this approach
can outperform state-of-the-art interpretable Neural Additive Models (NAM) and
variations thereof in both regression and classification of time series tasks,
achieving a predictive performance that is comparable to the top
non-interpretable methods for time series, LSTM and XGBoost.
( 2
min )
Understanding whether a property is priced fairly hinders buyers and sellers
since they usually do not have an objective viewpoint of the price distribution
for the overall market of their interest. Drawing from data collected of all
possible available properties for rent in Manhattan as of September 2023, this
paper aims to strengthen our understanding of model residuals; specifically on
machine learning models which generalize for a majority of the distribution of
a well-proportioned dataset. Most models generally perceive deviations from
predicted values as mere inaccuracies, however this paper proposes a different
vantage point: when generalizing to at least 75\% of the data-set, the
remaining deviations reveal significant insights. To harness these insights, we
introduce the Price Anomaly Score (PAS), a metric capable of capturing
boundaries between irregularly predicted prices. By combining relative pricing
discrepancies with statistical significance, the Price Anomaly Score (PAS)
offers a multifaceted view of rental valuations. This metric allows experts to
identify overpriced or underpriced properties within a dataset by aggregating
PAS values, then fine-tuning upper and lower boundaries to any threshold to set
indicators of choice.
( 3
min )
Traditional multi-view stereo (MVS) methods rely heavily on photometric and
geometric consistency constraints, but newer machine learning-based MVS methods
check geometric consistency across multiple source views only as a
post-processing step. In this paper, we present a novel approach that
explicitly encourages geometric consistency of reference view depth maps across
multiple source views at different scales during learning (see Fig. 1). We find
that adding this geometric consistency loss significantly accelerates learning
by explicitly penalizing geometrically inconsistent pixels, reducing the
training iteration requirements to nearly half that of other MVS methods. Our
extensive experiments show that our approach achieves a new state-of-the-art on
the DTU and BlendedMVS datasets, and competitive results on the Tanks and
Temples benchmark. To the best of our knowledge, GC-MVSNet is the first attempt
to enforce multi-view, multi-scale geometric consistency during learning.
( 2
min )
Text-To-Image (TTI) models, such as DALL-E and StableDiffusion, have
demonstrated remarkable prompt-based image generation capabilities.
Multilingual encoders may have a substantial impact on the cultural agency of
these models, as language is a conduit of culture. In this study, we explore
the cultural perception embedded in TTI models by characterizing culture across
three hierarchical tiers: cultural dimensions, cultural domains, and cultural
concepts. Based on this ontology, we derive prompt templates to unlock the
cultural knowledge in TTI models, and propose a comprehensive suite of
evaluation techniques, including intrinsic evaluations using the CLIP space,
extrinsic evaluations with a Visual-Question-Answer (VQA) model and human
assessments, to evaluate the cultural content of TTI-generated images. To
bolster our research, we introduce the CulText2I dataset, derived from four
diverse TTI models and spanning ten languages. Our experiments provide insights
regarding Do, What, Which and How research questions about the nature of
cultural encoding in TTI models, paving the way for cross-cultural applications
of these models.
( 2
min )
Hyperparameter Optimization (HPO) of Deep Learning-based models tends to be a
compute resource intensive process as it usually requires to train the target
model with many different hyperparameter configurations. We show that
integrating model performance prediction with early stopping methods holds
great potential to speed up the HPO process of deep learning models. Moreover,
we propose a novel algorithm called Swift-Hyperband that can use either
classical or quantum support vector regression for performance prediction and
benefit from distributed High Performance Computing environments. This
algorithm is tested not only for the Machine-Learned Particle Flow model used
in High Energy Physics, but also for a wider range of target models from
domains such as computer vision and natural language processing.
Swift-Hyperband is shown to find comparable (or better) hyperparameters as well
as using less computational resources in all test cases.
( 2
min )
Tensor network (TN) representation is a powerful technique for computer
vision and machine learning. TN structure search (TN-SS) aims to search for a
customized structure to achieve a compact representation, which is a
challenging NP-hard problem. Recent "sampling-evaluation-based" methods require
sampling an extensive collection of structures and evaluating them one by one,
resulting in prohibitively high computational costs. To address this issue, we
propose a novel TN paradigm, named SVD-inspired TN decomposition (SVDinsTN),
which allows us to efficiently solve the TN-SS problem from a regularized
modeling perspective, eliminating the repeated structure evaluations. To be
specific, by inserting a diagonal factor for each edge of the fully-connected
TN, SVDinsTN allows us to calculate TN cores and diagonal factors
simultaneously, with the factor sparsity revealing a compact TN structure. In
theory, we prove a convergence guarantee for the proposed method. Experimental
results demonstrate that the proposed method achieves approximately 100 to 1000
times acceleration compared to the state-of-the-art TN-SS methods while
maintaining a comparable representation ability.
( 2
min )
The unstructured nature of data used in foundation model development is a
challenge to systematic analyses for making data use and documentation
decisions. From a Responsible AI perspective, these decisions often rely upon
understanding how people are represented in data. We propose a framework
designed to guide analysis of human representation in unstructured data and
identify downstream risks. We apply the framework in two toy examples using the
Common Crawl web text corpus (C4) and LAION-400M. We also propose a set of
hypothetical action steps in service of dataset use, development, and
documentation.
( 2
min )
Crop management decision support systems are specialized tools for farmers
that reduce the riskiness of revenue streams, especially valuable for use under
the current climate changes that impact agricultural productivity.
Unfortunately, small farmers in India, who could greatly benefit from these
tools, do not have access to them. In this paper, we model an individual
greenhouse as a Markov Decision Process (MDP) and adapt Li and Li (2019)'s
Follow the Weighted Leader (FWL) online learning algorithm to offer crop
planning advice. We successfully produce utility-preserving cropping pattern
suggestions in simulations. When we compare against an offline planning
algorithm, we achieve the same cumulative revenue with greatly reduced runtime.
( 2
min )
Generative models can produce impressively realistic images. This paper
demonstrates that generated images have geometric features different from those
of real images. We build a set of collections of generated images, prequalified
to fool simple, signal-based classifiers into believing they are real. We then
show that prequalified generated images can be identified reliably by
classifiers that only look at geometric properties. We use three such
classifiers. All three classifiers are denied access to image pixels, and look
only at derived geometric features. The first classifier looks at the
perspective field of the image, the second looks at lines detected in the
image, and the third looks at relations between detected objects and shadows.
Our procedure detects generated images more reliably than SOTA local signal
based detectors, for images from a number of distinct generators. Saliency maps
suggest that the classifiers can identify geometric problems reliably. We
conclude that current generators cannot reliably reproduce geometric properties
of real images.
( 2
min )
Model-agnostic anomaly detection is one of the promising approaches in the
search for new beyond the standard model physics. In this paper, we present
Set-VAE, a particle-based variational autoencoder (VAE) anomaly detection
algorithm. We demonstrate a 2x signal efficiency gain compared with traditional
subjettiness-based jet selection. Furthermore, with an eye to the future
deployment to trigger systems, we propose the CLIP-VAE, which reduces the
inference-time cost of anomaly detection by using the KL-divergence loss as the
anomaly score, resulting in a 2x acceleration in latency and reducing the
caching requirement.
( 2
min )
Evaluating the accuracy of outputs generated by Large Language Models (LLMs)
is especially important in the climate science and policy domain. We introduce
the Expert Confidence in Climate Statements (ClimateX) dataset, a novel,
curated, expert-labeled dataset consisting of 8094 climate statements collected
from the latest Intergovernmental Panel on Climate Change (IPCC) reports,
labeled with their associated confidence levels. Using this dataset, we show
that recent LLMs can classify human expert confidence in climate-related
statements, especially in a few-shot learning setting, but with limited (up to
47%) accuracy. Overall, models exhibit consistent and significant
over-confidence on low and medium confidence statements. We highlight
implications of our results for climate communication, LLMs evaluation
strategies, and the use of LLMs in information retrieval systems.
( 2
min )
Although much work has been done on explainability in the computer vision and
natural language processing (NLP) fields, there is still much work to be done
to explain methods applied to time series as time series by nature can not be
understood at first sight. In this paper, we present a Deep Neural Network
(DNN) in a teacher-student architecture (distillation model) that offers
interpretability in time-series classification tasks. The explainability of our
approach is based on transforming the time series to 2D plots and applying
image highlight methods (such as LIME and GradCam), making the predictions
interpretable. At the same time, the proposed approach offers increased
accuracy competing with the baseline model with the trade-off of increasing the
training time.
( 2
min )
Astronomical transients, such as supernovae and other rare stellar
explosions, have been instrumental in some of the most significant discoveries
in astronomy. New astronomical sky surveys will soon record unprecedented
numbers of transients as sparsely and irregularly sampled multivariate time
series. To improve our understanding of the physical mechanisms of transients
and their progenitor systems, early-time measurements are necessary.
Prioritizing the follow-up of transients based on their age along with their
class is crucial for new surveys. To meet this demand, we present the first
method of predicting the age of transients in real-time from multi-wavelength
time-series observations. We build a Bayesian probabilistic recurrent neural
network. Our method can accurately predict the age of a transient with robust
uncertainties as soon as it is initially triggered by a survey telescope. This
work will be essential for the advancement of our understanding of the numerous
young transients being detected by ongoing and upcoming astronomical surveys.
( 2
min )
We introduce a diffusion-based generative model to describe the distribution
of galaxies in our Universe directly as a collection of points in 3-D space
(coordinates) optionally with associated attributes (e.g., velocities and
masses), without resorting to binning or voxelization. The custom diffusion
model can be used both for emulation, reproducing essential summary statistics
of the galaxy distribution, as well as inference, by computing the conditional
likelihood of a galaxy field. We demonstrate a first application to massive
dark matter haloes in the Quijote simulation suite. This approach can be
extended to enable a comprehensive analysis of cosmological data, circumventing
limitations inherent to summary statistic -- as well as neural simulation-based
inference methods.
( 2
min )
The aim of this short note is to show that Denoising Diffusion Probabilistic
Model DDPM, a non-homogeneous discrete-time Markov process, can be represented
by a time-homogeneous continuous-time Markov process observed at non-uniformly
sampled discrete times. Surprisingly, this continuous-time Markov process is
the well-known and well-studied Ornstein-Ohlenbeck (OU) process, which was
developed in 1930's for studying Brownian particles in Harmonic potentials. We
establish the formal equivalence between DDPM and the OU process using its
analytical solution. We further demonstrate that the design problem of the
noise scheduler for non-homogeneous DDPM is equivalent to designing observation
times for the OU process. We present several heuristic designs for observation
times based on principled quantities such as auto-variance and Fisher
Information and connect them to ad hoc noise schedules for DDPM. Interestingly,
we show that the Fisher-Information-motivated schedule corresponds exactly the
cosine schedule, which was developed without any theoretical foundation but is
the current state-of-the-art noise schedule.
( 2
min )
Diffusion models excel at generating photo-realistic images but come with
significant computational costs in both training and sampling. While various
techniques address these computational challenges, a less-explored issue is
designing an efficient and adaptable network backbone for iterative refinement.
Current options like U-Net and Vision Transformer often rely on
resource-intensive deep networks and lack the flexibility needed for generating
images at variable resolutions or with a smaller network than used in training.
This study introduces LEGO bricks, which seamlessly integrate Local-feature
Enrichment and Global-content Orchestration. These bricks can be stacked to
create a test-time reconfigurable diffusion backbone, allowing selective
skipping of bricks to reduce sampling costs and generate higher-resolution
images than the training data. LEGO bricks enrich local regions with an MLP and
transform them using a Transformer block while maintaining a consistent
full-resolution image across all bricks. Experimental results demonstrate that
LEGO bricks enhance training efficiency, expedite convergence, and facilitate
variable-resolution image generation while maintaining strong generative
performance. Moreover, LEGO significantly reduces sampling time compared to
other methods, establishing it as a valuable enhancement for diffusion models.
( 2
min )
Causal inference studies whether the presence of a variable influences an
observed outcome. As measured by quantities such as the "average treatment
effect," this paradigm is employed across numerous biological fields, from
vaccine and drug development to policy interventions. Unfortunately, the
majority of these methods are often limited to univariate outcomes. Our work
generalizes causal estimands to outcomes with any number of dimensions or any
measurable space, and formulates traditional causal estimands for nominal
variables as causal discrepancy tests. We propose a simple technique for
adjusting universally consistent conditional independence tests and prove that
these tests are universally consistent causal discrepancy tests. Numerical
experiments illustrate that our method, Causal CDcorr, leads to improvements in
both finite sample validity and power when compared to existing strategies. Our
methods are all open source and available at github.com/ebridge2/cdcorr.
( 2
min )
Astronomical transients, such as supernovae and other rare stellar
explosions, have been instrumental in some of the most significant discoveries
in astronomy. New astronomical sky surveys will soon record unprecedented
numbers of transients as sparsely and irregularly sampled multivariate time
series. To improve our understanding of the physical mechanisms of transients
and their progenitor systems, early-time measurements are necessary.
Prioritizing the follow-up of transients based on their age along with their
class is crucial for new surveys. To meet this demand, we present the first
method of predicting the age of transients in real-time from multi-wavelength
time-series observations. We build a Bayesian probabilistic recurrent neural
network. Our method can accurately predict the age of a transient with robust
uncertainties as soon as it is initially triggered by a survey telescope. This
work will be essential for the advancement of our understanding of the numerous
young transients being detected by ongoing and upcoming astronomical surveys.
( 2
min )
There are a number of available methods for selecting whom to prioritize for
treatment, including ones based on treatment effect estimation, risk scoring,
and hand-crafted rules. We propose rank-weighted average treatment effect
(RATE) metrics as a simple and general family of metrics for comparing and
testing the quality of treatment prioritization rules. RATE metrics are
agnostic as to how the prioritization rules were derived, and only assess how
well they identify individuals that benefit the most from treatment. We define
a family of RATE estimators and prove a central limit theorem that enables
asymptotically exact inference in a wide variety of randomized and
observational study settings. RATE metrics subsume a number of existing
metrics, including the Qini coefficient, and our analysis directly yields
inference methods for these metrics. We showcase RATE in the context of a
number of applications, including optimal targeting of aspirin to stroke
patients.
( 2
min )
We introduce a diffusion-based generative model to describe the distribution
of galaxies in our Universe directly as a collection of points in 3-D space
(coordinates) optionally with associated attributes (e.g., velocities and
masses), without resorting to binning or voxelization. The custom diffusion
model can be used both for emulation, reproducing essential summary statistics
of the galaxy distribution, as well as inference, by computing the conditional
likelihood of a galaxy field. We demonstrate a first application to massive
dark matter haloes in the Quijote simulation suite. This approach can be
extended to enable a comprehensive analysis of cosmological data, circumventing
limitations inherent to summary statistic -- as well as neural simulation-based
inference methods.
( 2
min )
Synthetic data (SD) have garnered attention as a privacy enhancing
technology. Unfortunately, there is no standard for quantifying their degree of
privacy protection. In this paper, we discuss proposed quantification
approaches. This contributes to the development of SD privacy standards;
stimulates multi-disciplinary discussion; and helps SD researchers make
informed modeling and evaluation decisions.
( 2
min )
We believe generative AI has the potential over time to transform virtually every customer experience we know. The number of companies launching generative AI applications on AWS is substantial and building quickly, including adidas, Booking.com, Bridgewater Associates, Clariant, Cox Automotive, GoDaddy, and LexisNexis Legal & Professional, to name just a few. Innovative startups like Perplexity […]
( 26
min )
Amazon SageMaker is a fully managed service that enables developers and data scientists to quickly and easily build, train, and deploy machine learning (ML) models at scale. SageMaker makes it easy to deploy models into production directly through API calls to the service. Models are packaged into containers for robust and scalable deployments. SageMaker provides […]
( 12
min )
Amazon SageMaker is a fully managed service that enables developers and data scientists to quickly and effortlessly build, train, and deploy machine learning (ML) models at any scale. SageMaker makes it straightforward to deploy models into production directly through API calls to the service. Models are packaged into containers for robust and scalable deployments. Although […]
( 15
min )
Today, we are excited to announce support for Code Editor, a new integrated development environment (IDE) option in Amazon SageMaker Studio. Code Editor is based on Code-OSS, Visual Studio Code Open Source, and provides access to the familiar environment and tools of the popular IDE that machine learning (ML) developers know and love, fully integrated […]
( 9
min )
As democratization of foundation models (FMs) becomes more prevalent and demand for AI-augmented services increases, software as a service (SaaS) providers are looking to use machine learning (ML) platforms that support multiple tenants—for data scientists internal to their organization and external customers. More and more companies are realizing the value of using FMs to generate […]
( 17
min )
As organizations deploy models to production, they are constantly looking for ways to optimize the performance of their foundation models (FMs) running on the latest accelerators, such as AWS Inferentia and GPUs, so they can reduce their costs and decrease response latency to provide the best experience to end-users. However, some FMs don’t fully utilize […]
( 13
min )
Amazon SageMaker makes it straightforward to deploy machine learning (ML) models for real-time inference and offers a broad selection of ML instances spanning CPUs and accelerators such as AWS Inferentia. As a fully managed service, you can scale your model deployments, minimize inference costs, and manage your models more effectively in production with reduced operational […]
( 6
min )
Amazon SageMaker Canvas is a no-code workspace that enables analysts and citizen data scientists to generate accurate machine learning (ML) predictions for their business needs. Starting today, SageMaker Canvas supports advanced model build configurations such as selecting a training method (ensemble or hyperparameter optimization) and algorithms, customizing the training and validation data split ratio, and […]
( 12
min )
Building foundation models (FMs) requires building, maintaining, and optimizing large clusters to train models with tens to hundreds of billions of parameters on vast amounts of data. Creating a resilient environment that can handle failures and environmental changes without losing days or weeks of model training progress is an operational challenge that requires you to […]
( 10
min )
Digital publishers are continuously looking for ways to streamline and automate their media workflows to generate and publish new content as rapidly as they can, but without foregoing quality. Adding images to capture the essence of text can improve the reading experience. Machine learning techniques can help you discover such images. “A striking image is […]
( 10
min )
The risks associated with generative AI have been well-publicized. Toxicity, bias, escaped PII, and hallucinations negatively impact an organization’s reputation and damage customer trust. Research shows that not only do risks for bias and toxicity transfer from pre-trained foundation models (FM) to task-specific generative AI services, but that tuning an FM for specific tasks, on […]
( 13
min )
Data preparation is a crucial step in any machine learning (ML) workflow, yet it often involves tedious and time-consuming tasks. Amazon SageMaker Canvas now supports comprehensive data preparation capabilities powered by Amazon SageMaker Data Wrangler. With this integration, SageMaker Canvas provides customers with an end-to-end no-code workspace to prepare data, build and use ML and […]
( 7
min )
In the last few years Large Language Models (LLMs) have risen to prominence as outstanding tools capable of understanding, generating and manipulating text with unprecedented proficiency. Their potential applications span from conversational agents to content generation and information retrieval, holding the promise of revolutionizing all industries. However, harnessing this potential while ensuring the responsible and […]
( 15
min )
In today’s rapidly evolving landscape of artificial intelligence, deep learning models have found themselves at the forefront of innovation, with applications spanning computer vision (CV), natural language processing (NLP), and recommendation systems. However, the increasing cost associated with training and fine-tuning these models poses a challenge for enterprises. This cost is primarily driven by the […]
( 8
min )
In November 2023, MarketsandMarkets announced the publication of its Knowledge Graph Market report. In its announcement, M&M estimated the 2023 global knowledge graph market at $0.9 billion, forecasting market growth to $2.4 billion by 2028, a compound annual growth rate of 21.9 percent. M&M also listed these 12 “key players” in its announcement: I haven’t… Read More »A few large enterprise software provider strategies for the knowledge graph market
The post A few large enterprise software provider strategies for the knowledge graph market appeared first on Data Science Central.
( 21
min )
AI Weirdness: the strange side of machine learning
( 2
min )
Predicting the infiltration of Glioblastoma (GBM) from medical MRI scans is
crucial for understanding tumor growth dynamics and designing personalized
radiotherapy treatment plans.Mathematical models of GBM growth can complement
the data in the prediction of spatial distributions of tumor cells. However,
this requires estimating patient-specific parameters of the model from clinical
data, which is a challenging inverse problem due to limited temporal data and
the limited time between imaging and diagnosis. This work proposes a method
that uses Physics-Informed Neural Networks (PINNs) to estimate patient-specific
parameters of a reaction-diffusion PDE model of GBM growth from a single 3D
structural MRI snapshot. PINNs embed both the data and the PDE into a loss
function, thus integrating theory and data. Key innovations include the
identification and estimation of characteristic non-dimensional parameters, a
pre-training step that utilizes the non-dimensional parameters and a
fine-tuning step to determine the patient specific parameters. Additionally,
the diffuse domain method is employed to handle the complex brain geometry
within the PINN framework. Our method is validated both on synthetic and
patient datasets, and shows promise for real-time parametric inference in the
clinical setting for personalized GBM treatment.
( 2
min )
Electroanatomical mapping is a technique used in cardiology to create a
detailed 3D map of the electrical activity in the heart. It is useful for
diagnosis, treatment planning and real time guidance in cardiac ablation
procedures to treat arrhythmias like atrial fibrillation. A probabilistic
machine learning model trained on a library of CT/MRI scans of the heart can be
used during electroanatomical mapping to generate a patient-specific 3D model
of the chamber being mapped. The use of probabilistic machine learning models
under a Bayesian framework provides a way to quantify uncertainty in results
and provide a natural framework of interpretability of the model. Here we
introduce a Bayesian approach to surface reconstruction of cardiac chamber
models from a sparse 3D point cloud data acquired during electroanatomical
mapping. We show how probabilistic graphical models trained on segmented CT/MRI
data can be used to generate cardiac chamber models from few acquired locations
thereby reducing procedure time and x-ray exposure. We show how they provide
insight into what the neural network learns from the segmented CT/MRI images
used to train the network, which provides explainability to the resulting
cardiac chamber models generated by the model.
( 2
min )
We study the sample complexity of identifying the pure strategy Nash
equilibrium (PSNE) in a two-player zero-sum matrix game with noise. Formally,
we are given a stochastic model where any learner can sample an entry $(i,j)$
of the input matrix $A\in[-1,1]^{n\times m}$ and observe $A_{i,j}+\eta$ where
$\eta$ is a zero-mean 1-sub-Gaussian noise. The aim of the learner is to
identify the PSNE of $A$, whenever it exists, with high probability while
taking as few samples as possible. Zhou et al. (2017) presents an
instance-dependent sample complexity lower bound that depends only on the
entries in the row and column in which the PSNE lies. We design a near-optimal
algorithm whose sample complexity matches the lower bound, up to log factors.
The problem of identifying the PSNE also generalizes the problem of pure
exploration in stochastic multi-armed bandits and dueling bandits, and our
result matches the optimal bounds, up to log factors, in both the settings.
( 2
min )
Large language models (LLMs) aligned to human preferences via reinforcement
learning from human feedback (RLHF) underpin many commercial applications of
LLM technology. Despite this, the impacts of RLHF on LLM internals remain
opaque. We propose a novel method for interpreting implicit reward models
(IRMs) in LLMs learned through RLHF. Our approach trains pairs of autoencoders
on activations from a base LLM and its RLHF-tuned variant. Through a comparison
of autoencoder hidden spaces, we identify features that reflect the accuracy of
the learned IRM. To illustrate our method, we fine-tune an LLM via RLHF to
learn a token-utility mapping and maximize the aggregate utility of generated
text. This is the first application of sparse autoencoders to interpreting
IRMs. Our method provides an abstract approximation of reward integrity and
holds promise for measuring alignment between specified objectives and learned
model behaviors.
( 2
min )
Many problems in machine learning can be formulated as solving
entropy-regularized optimal transport on the space of probability measures. The
canonical approach involves the Sinkhorn iterates, renowned for their rich
mathematical properties. Recently, the Sinkhorn algorithm has been recast
within the mirror descent framework, thus benefiting from classical
optimization theory insights. Here, we build upon this result by introducing a
continuous-time analogue of the Sinkhorn algorithm. This perspective allows us
to derive novel variants of Sinkhorn schemes that are robust to noise and bias.
Moreover, our continuous-time dynamics not only generalize but also offer a
unified perspective on several recently discovered dynamics in machine learning
and mathematics, such as the "Wasserstein mirror flow" of (Deb et al. 2023) or
the "mean-field Schr\"odinger equation" of (Claisse et al. 2023).
( 2
min )
In climate simulations, small-scale processes shape ocean dynamics but remain
computationally expensive to resolve directly. For this reason, their
contributions are commonly approximated using empirical parameterizations,
which lead to significant errors in long-term projections. In this work, we
develop parameterizations based on Fourier Neural Operators, showcasing their
accuracy and generalizability in comparison to other approaches. Finally, we
discuss the potential and limitations of neural networks operating in the
frequency domain, paving the way for future investigation.
( 2
min )
Missing data is a common problem in practical settings. Various imputation
methods have been developed to deal with missing data. However, even though the
label is usually available in the training data, the common practice of
imputation usually only relies on the input and ignores the label. In this
work, we illustrate how stacking the label into the input can significantly
improve the imputation of the input. In addition, we propose a classification
strategy that initializes the predicted test label with missing values and
stacks the label with the input for imputation. This allows imputing the label
and the input at the same time. Also, the technique is capable of handling data
training with missing labels without any prior imputation and is applicable to
continuous, categorical, or mixed-type data. Experiments show promising results
in terms of accuracy.
( 2
min )
Many problems in machine learning can be formulated as solving
entropy-regularized optimal transport on the space of probability measures. The
canonical approach involves the Sinkhorn iterates, renowned for their rich
mathematical properties. Recently, the Sinkhorn algorithm has been recast
within the mirror descent framework, thus benefiting from classical
optimization theory insights. Here, we build upon this result by introducing a
continuous-time analogue of the Sinkhorn algorithm. This perspective allows us
to derive novel variants of Sinkhorn schemes that are robust to noise and bias.
Moreover, our continuous-time dynamics not only generalize but also offer a
unified perspective on several recently discovered dynamics in machine learning
and mathematics, such as the "Wasserstein mirror flow" of (Deb et al. 2023) or
the "mean-field Schr\"odinger equation" of (Claisse et al. 2023).
( 2
min )
Rodney Brooks, co-founder of iRobot, kicks off an MIT symposium on the promise and potential pitfalls of increasingly powerful AI tools like ChatGPT.
( 12
min )
Amazon SageMaker Studio provides a fully managed solution for data scientists to interactively build, train, and deploy machine learning (ML) models. Amazon SageMaker notebook jobs allow data scientists to run their notebooks on demand or on a schedule with a few clicks in SageMaker Studio. With this launch, you can programmatically run notebooks as jobs […]
( 11
min )
The rapid growth of generative AI brings promising new innovation, and at the same time raises new challenges. These challenges include some that were common before generative AI, such as bias and explainability, and new ones unique to foundation models (FMs), including hallucination and toxicity. At AWS, we are committed to developing generative AI responsibly, […]
( 9
min )
Since launching in June 2023, the AWS Generative AI Innovation Center team of strategists, data scientists, machine learning (ML) engineers, and solutions architects have worked with hundreds of customers worldwide, and helped them ideate, prioritize, and build bespoke solutions that harness the power of generative AI. Customers worked closely with us to prioritize use cases, […]
( 4
min )
Mira Murati as CTO, Greg Brockman returns as President. Read messages from CEO Sam Altman and board chair Bret Taylor.
( 5
min )
The magnitude of a metric space was recently established as a novel
invariant, providing a measure of the `effective size' of a space across
multiple scales. By capturing both geometrical and topological properties of
data, magnitude is poised to address challenges in unsupervised representation
learning tasks. We formalise a novel notion of dissimilarity between magnitude
functions of finite metric spaces and use them to derive a quality measure for
dimensionality reduction tasks. Our measure is provably stable under
perturbations of the data, can be efficiently calculated, and enables a
rigorous multi-scale comparison of embeddings. We show the utility of our
measure in an experimental suite that comprises different domains and tasks,
including the comparison of data visualisations.
( 2
min )
Motivated by applications in text mining and discrete distribution inference,
we investigate the testing for equality of probability mass functions of $K$
groups of high-dimensional multinomial distributions. A test statistic, which
is shown to have an asymptotic standard normal distribution under the null, is
proposed. The optimal detection boundary is established, and the proposed test
is shown to achieve this optimal detection boundary across the entire parameter
space of interest. The proposed method is demonstrated in simulation studies
and applied to analyze two real-world datasets to examine variation among
consumer reviews of Amazon movies and diversity of statistical paper abstracts.
( 2
min )
In the multi-armed bandit framework, there are two formulations that are
commonly employed to handle time-varying reward distributions: adversarial
bandit and nonstationary bandit. Although their oracles, algorithms, and regret
analysis differ significantly, we provide a unified formulation in this paper
that smoothly bridges the two as special cases. The formulation uses an oracle
that takes the best-fixed arm within time windows. Depending on the window
size, it turns into the oracle in hindsight in the adversarial bandit and
dynamic oracle in the nonstationary bandit. We provide algorithms that attain
the optimal regret with the matching lower bound.
( 2
min )
Although gradient descent with momentum is widely used in modern deep
learning, a concrete understanding of its effects on the training trajectory
still remains elusive. In this work, we empirically show that momentum gradient
descent with a large learning rate and learning rate warmup displays large
catapults, driving the iterates towards flatter minima than those found by
gradient descent. We then provide empirical evidence and theoretical intuition
that the large catapult is caused by momentum "amplifying" the
self-stabilization effect (Damian et al., 2023).
( 2
min )
The exploration-exploitation dilemma has been a central challenge in
reinforcement learning (RL) with complex model classes. In this paper, we
propose a new algorithm, Monotonic Q-Learning with Upper Confidence Bound
(MQL-UCB) for RL with general function approximation. Our key algorithmic
design includes (1) a general deterministic policy-switching strategy that
achieves low switching cost, (2) a monotonic value function structure with
carefully controlled function class complexity, and (3) a variance-weighted
regression scheme that exploits historical trajectories with high data
efficiency. MQL-UCB achieves minimax optimal regret of $\tilde{O}(d\sqrt{HK})$
when $K$ is sufficiently large and near-optimal policy switching cost of
$\tilde{O}(dH)$, with $d$ being the eluder dimension of the function class, $H$
being the planning horizon, and $K$ being the number of episodes.
Our work sheds light on designing provably sample-efficient and
deployment-efficient Q-learning with nonlinear function approximation.
( 2
min )
Constrained optimization of the parameters of a simulator plays a crucial
role in a design process. These problems become challenging when the simulator
is stochastic, computationally expensive, and the parameter space is
high-dimensional. One can efficiently perform optimization only by utilizing
the gradient with respect to the parameters, but these gradients are
unavailable in many legacy, black-box codes. We introduce the algorithm
Scout-Nd (Stochastic Constrained Optimization for N dimensions) to tackle the
issues mentioned earlier by efficiently estimating the gradient, reducing the
noise of the gradient estimator, and applying multi-fidelity schemes to further
reduce computational effort. We validate our approach on standard benchmarks,
demonstrating its effectiveness in optimizing parameters highlighting better
performance compared to existing methods.
( 2
min )
Utility-Based Shortfall Risk (UBSR) is a risk metric that is increasingly
popular in financial applications, owing to certain desirable properties that
it enjoys. We consider the problem of estimating UBSR in a recursive setting,
where samples from the underlying loss distribution are available
one-at-a-time. We cast the UBSR estimation problem as a root finding problem,
and propose stochastic approximation-based estimations schemes. We derive
non-asymptotic bounds on the estimation error in the number of samples. We also
consider the problem of UBSR optimization within a parameterized class of
random variables. We propose a stochastic gradient descent based algorithm for
UBSR optimization, and derive non-asymptotic bounds on its convergence.
( 2
min )
In a high-dimensional regression framework, we study consequences of the
naive two-step procedure where first the dimension of the input variables is
reduced and second, the reduced input variables are used to predict the output
variable with kernel regression. In order to analyze the resulting regression
errors, a novel stability result for kernel regression with respect to the
Wasserstein distance is derived. This allows us to bound errors that occur when
perturbed input data is used to fit the regression function. We apply the
general stability result to principal component analysis (PCA). Exploiting
known estimates from the literature on both principal component analysis and
kernel regression, we deduce convergence rates for the two-step procedure. The
latter turns out to be particularly useful in a semi-supervised setting.
( 2
min )
Density power divergence (DPD) is designed to robustly estimate the
underlying distribution of observations, in the presence of outliers. However,
DPD involves an integral of the power of the parametric density models to be
estimated; the explicit form of the integral term can be derived only for
specific densities, such as normal and exponential densities. While we may
perform a numerical integration for each iteration of the optimization
algorithms, the computational complexity has hindered the practical application
of DPD-based estimation to more general parametric densities. To address the
issue, this study introduces a stochastic approach to minimize DPD for general
parametric density models. The proposed approach also can be employed to
minimize other density power-based $\gamma$-divergences, by leveraging
unnormalized models.
( 2
min )
We study the long time behavior of an underdamped mean-field Langevin (MFL)
equation, and provide a general convergence as well as an exponential
convergence rate result under different conditions. The results on the MFL
equation can be applied to study the convergence of the Hamiltonian gradient
descent algorithm for the overparametrized optimization. We then provide a
numerical example of the algorithm to train a generative adversarial networks
(GAN).
( 2
min )
We consider the gradient descent flow widely used for the minimization of the
$\mathcal{L}^2$ cost function in Deep Learning networks, and introduce two
modified versions; one adapted for the overparametrized setting, and the other
for the underparametrized setting. Both have a clear and natural invariant
geometric meaning, taking into account the pullback vector bundle structure in
the overparametrized, and the pushforward vector bundle structure in the
underparametrized setting. In the overparametrized case, we prove that,
provided that a rank condition holds, all orbits of the modified gradient
descent drive the $\mathcal{L}^2$ cost to its global minimum at a uniform
exponential convergence rate. We point out relations of the latter to
sub-Riemannian geometry.
( 2
min )
The convergence of deterministic policy gradient under the Hadamard
parameterization is studied in the tabular setting and the linear convergence
of the algorithm is established. To this end, we first show that the error
decreases at an $O(\frac{1}{k})$ rate for all the iterations. Based on this
result, we further show that the algorithm has a faster local linear
convergence rate after $k_0$ iterations, where $k_0$ is a constant that only
depends on the MDP problem and the initialization. To show the local linear
convergence of the algorithm, we have indeed established the contraction of the
sub-optimal probability $b_s^k$ (i.e., the probability of the output policy
$\pi^k$ on non-optimal actions) when $k\ge k_0$.
( 2
min )
Navigating dynamic physical environments without obstructing or damaging
human assets is of quintessential importance for social robots. In this work,
we solve autonomous drone navigation's sub-problem of predicting out-of-domain
human and agent trajectories using a deep generative model. Our method:
General-PECNet or G-PECNet observes an improvement of 9.5\% on the Final
Displacement Error (FDE) on 2020's benchmark: PECNet through a combination of
architectural improvements inspired by periodic activation functions and
synthetic trajectory (data) augmentations using Hidden Markov Models (HMMs) and
Reinforcement Learning (RL). Additionally, we propose a simple
geometry-inspired metric for trajectory non-linearity and outlier detection,
helpful for the task. Code available at
$\href{https://github.com/Aryan-Garg/PECNet-Pedestrian-Trajectory-Prediction.git}{GitHub}$
( 2
min )
We study the long time behavior of an underdamped mean-field Langevin (MFL)
equation, and provide a general convergence as well as an exponential
convergence rate result under different conditions. The results on the MFL
equation can be applied to study the convergence of the Hamiltonian gradient
descent algorithm for the overparametrized optimization. We then provide a
numerical example of the algorithm to train a generative adversarial networks
(GAN).
( 2
min )
Federated learning is a new learning paradigm that decouples data collection
and model training via multi-party computation and model aggregation. As a
flexible learning setting, federated learning has the potential to integrate
with other learning frameworks. We conduct a focused survey of federated
learning in conjunction with other learning algorithms. Specifically, we
explore various learning algorithms to improve the vanilla federated averaging
algorithm and review model fusion methods such as adaptive aggregation,
regularization, clustered methods, and Bayesian methods. Following the emerging
trends, we also discuss federated learning in the intersection with other
learning paradigms, termed federated X learning, where X includes multitask
learning, meta-learning, transfer learning, unsupervised learning, and
reinforcement learning. This survey reviews the state of the art, challenges,
and future directions.
( 2
min )
As the adoption of Artificial Intelligence (AI) systems within the clinical
environment grows, limitations in bandwidth and compute can create
communication bottlenecks when streaming imaging data, leading to delays in
patient care and increased cost. As such, healthcare providers and AI vendors
will require greater computational infrastructure, therefore dramatically
increasing costs. To that end, we developed ISLE, an intelligent streaming
framework for high-throughput, compute- and bandwidth- optimized, and cost
effective AI inference for clinical decision making at scale. In our
experiments, ISLE on average reduced data transmission by 98.02% and decoding
time by 98.09%, while increasing throughput by 2,730%. We show that ISLE
results in faster turnaround times, and reduced overall cost of data,
transmission, and compute, without negatively impacting clinical decision
making using AI systems.
( 2
min )
The thrombotic microangiopathies (TMAs) manifest in renal biopsy histology
with a broad spectrum of acute and chronic findings. Precise diagnostic
criteria for a renal biopsy diagnosis of TMA are missing. As a first step
towards a machine learning- and computer vision-based analysis of wholes slide
images from renal biopsies, we trained a segmentation model for the decisive
diagnostic kidney tissue compartments artery, arteriole, glomerulus on a set of
whole slide images from renal biopsies with TMAs and Mimickers (distinct
diseases with a similar nephropathological appearance as TMA like severe benign
nephrosclerosis, various vasculitides, Bevacizumab-plug glomerulopathy,
arteriolar light chain deposition disease). Our segmentation model combines a
U-Net-based tissue detection with a Shifted windows-transformer architecture to
reach excellent segmentation results for even the most severely altered
glomeruli, arterioles and arteries, even on unseen staining domains from a
different nephropathology lab. With accurate automatic segmentation of the
decisive renal biopsy compartments in human renal vasculopathies, we have laid
the foundation for large-scale compartment-specific machine learning and
computer vision analysis of renal biopsy repositories with TMAs.
( 3
min )
Explainable Artificial Intelligence (XAI) is targeted at understanding how
models perform feature selection and derive their classification decisions.
This paper explores post-hoc explanations for deep neural networks in the audio
domain. Notably, we present a novel Open Source audio dataset consisting of
30,000 audio samples of English spoken digits which we use for classification
tasks on spoken digits and speakers' biological sex. We use the popular XAI
technique Layer-wise Relevance Propagation (LRP) to identify relevant features
for two neural network architectures that process either waveform or
spectrogram representations of the data. Based on the relevance scores obtained
from LRP, hypotheses about the neural networks' feature selection are derived
and subsequently tested through systematic manipulations of the input data.
Further, we take a step beyond visual explanations and introduce audible
heatmaps. We demonstrate the superior interpretability of audible explanations
over visual ones in a human user study.
( 2
min )
In the field of statistical physics, machine learning has gained significant
popularity and has achieved remarkable results in recent studies on phase
transitions.In this paper, we apply Principal Component Analysis (PCA) and
Autoencoder(AE) based on Unsupervised learning to study the various
configurations of the percolation model in equilibrium phase transition. In
certain phase transition models, such as the DP model in non-equilibrium phase
transitions, the order parameter is particle density. However, in some other
phase transition models, such as the percolation model, it is not. This study
involved randomizing and selecting percolation graphs to be used as input for a
neural network, and analyzed the obtained results, indicating that the outputs
of the single latent variable of AE and the first principal component of PCA
are signals related to particle density.
( 2
min )
We introduce a generalizable approach that combines perturbation method and
one-shot transfer learning to solve nonlinear ODEs with a single polynomial
term, using Physics-Informed Neural Networks (PINNs). Our method transforms
non-linear ODEs into linear ODE systems, trains a PINN across varied
conditions, and offers a closed-form solution for new instances within the same
non-linear ODE class. We demonstrate the effectiveness of this approach on the
Duffing equation and suggest its applicability to similarly structured PDEs and
ODE systems.
( 2
min )
In recent years, Large Language Models (LLM) have emerged as pivotal tools in
various applications. However, these models are susceptible to adversarial
prompt attacks, where attackers can carefully curate input strings that lead to
undesirable outputs. The inherent vulnerability of LLMs stems from their
input-output mechanisms, especially when presented with intensely
out-of-distribution (OOD) inputs. This paper proposes a token-level detection
method to identify adversarial prompts, leveraging the LLM's capability to
predict the next token's probability. We measure the degree of the model's
perplexity and incorporate neighboring token information to encourage the
detection of contiguous adversarial prompt sequences. As a result, we propose
two methods: one that identifies each token as either being part of an
adversarial prompt or not, and another that estimates the probability of each
token being part of an adversarial prompt.
( 2
min )
Zero-shot Dialogue State Tracking (DST) addresses the challenge of acquiring
and annotating task-oriented dialogues, which can be time-consuming and costly.
However, DST extends beyond simple slot-filling and requires effective updating
strategies for tracking dialogue state as conversations progress. In this
paper, we propose ParsingDST, a new In-Context Learning (ICL) method, to
introduce additional intricate updating strategies in zero-shot DST. Our
approach reformulates the DST task by leveraging powerful Large Language Models
(LLMs) and translating the original dialogue text to JSON through semantic
parsing as an intermediate state. We also design a novel framework that
includes more modules to ensure the effectiveness of updating strategies in the
text-to-JSON process. Experimental results demonstrate that our approach
outperforms existing zero-shot DST methods on MultiWOZ, exhibiting significant
improvements in Joint Goal Accuracy (JGA) and slot accuracy compared to
existing ICL methods. Our code has been released.
( 2
min )
To process sensor data in the Internet of Things(IoTs), embedded deep
learning for 1-dimensional data is an important technique. In the past, CNNs
were frequently used because they are simple to optimise for special embedded
hardware such as FPGAs. This work proposes a novel LSTM cell optimisation aimed
at energy-efficient inference on end devices. Using the traffic speed
prediction as a case study, a vanilla LSTM model with the optimised LSTM cell
achieves 17534 inferences per second while consuming only 3.8 $\mu$J per
inference on the FPGA XC7S15 from Spartan-7 family. It achieves at least
5.4$\times$ faster throughput and 1.37$\times$ more energy efficient than
existing approaches.
( 2
min )
The ability to construct a realistic simulator of financial exchanges,
including reproducing the dynamics of the limit order book, can give insight
into many counterfactual scenarios, such as a flash crash, a margin call, or
changes in macroeconomic outlook. In recent years, agent-based models have been
developed that reproduce many features of an exchange, as summarised by a set
of stylised facts and statistics. However, the ability to calibrate simulators
to a specific period of trading remains an open challenge. In this work, we
develop a novel approach to the calibration of market simulators by leveraging
recent advances in deep learning, specifically using neural density estimators
and embedding networks. We demonstrate that our approach is able to correctly
identify high probability parameter sets, both when applied to synthetic and
historical data, and without reliance on manually selected or weighted
ensembles of stylised facts.
( 2
min )
Normalizing flows (NF) recently gained attention as a way to construct
generative networks with exact likelihood calculation out of composable layers.
However, NF is restricted to dimension-preserving transformations. Surjection
VAE (SurVAE) has been proposed to extend NF to dimension-altering
transformations. Such networks are desirable because they are expressive and
can be precisely trained. We show that the approaches are a re-invention of PDF
projection, which appeared over twenty years earlier and is much further
developed.
( 2
min )
We present a new method that includes three key components of distributed
optimization and federated learning: variance reduction of stochastic
gradients, partial participation, and compressed communication. We prove that
the new method has optimal oracle complexity and state-of-the-art communication
complexity in the partial participation setting. Regardless of the
communication compression feature, our method successfully combines variance
reduction and partial participation: we get the optimal oracle complexity,
never need the participation of all nodes, and do not require the bounded
gradients (dissimilarity) assumption.
( 2
min )
Utility-Based Shortfall Risk (UBSR) is a risk metric that is increasingly
popular in financial applications, owing to certain desirable properties that
it enjoys. We consider the problem of estimating UBSR in a recursive setting,
where samples from the underlying loss distribution are available
one-at-a-time. We cast the UBSR estimation problem as a root finding problem,
and propose stochastic approximation-based estimations schemes. We derive
non-asymptotic bounds on the estimation error in the number of samples. We also
consider the problem of UBSR optimization within a parameterized class of
random variables. We propose a stochastic gradient descent based algorithm for
UBSR optimization, and derive non-asymptotic bounds on its convergence.
( 2
min )
Artificial neural networks can be represented by paths. Generated as random
walks on a dense network graph, we find that the resulting sparse networks
allow for deterministic initialization and even weights with fixed sign. Such
networks can be trained sparse from scratch, avoiding the expensive procedure
of training a dense network and compressing it afterwards. Although sparse,
weights are accessed as contiguous blocks of memory. In addition, enumerating
the paths using deterministic low discrepancy sequences, for example the Sobol'
sequence, amounts to connecting the layers of neural units by progressive
permutations, which naturally avoids bank conflicts in parallel computer
hardware. We demonstrate that the artificial neural networks generated by low
discrepancy sequences can achieve an accuracy within reach of their dense
counterparts at a much lower computational complexity.
( 2
min )
In the multi-armed bandit framework, there are two formulations that are
commonly employed to handle time-varying reward distributions: adversarial
bandit and nonstationary bandit. Although their oracles, algorithms, and regret
analysis differ significantly, we provide a unified formulation in this paper
that smoothly bridges the two as special cases. The formulation uses an oracle
that takes the best-fixed arm within time windows. Depending on the window
size, it turns into the oracle in hindsight in the adversarial bandit and
dynamic oracle in the nonstationary bandit. We provide algorithms that attain
the optimal regret with the matching lower bound.
( 2
min )
Deep neural networks (DNNs), the agents of deep learning (DL), require a
massive number of parallel/sequential operations. This makes it difficult to
comprehend DNNs' operations and impedes proper diagnosis. Without better
knowledge of their internal process, deploying DNNs in high-stakes domains can
lead to catastrophic failures. Therefore, to build more reliable DNNs/DL to be
deployed in high-stakes real-world problems, it is imperative that we gain
insights into DNNs' internal operations underlying their decision-making. Here,
we use the self-organizing map (SOM) to analyze DL models' internal codes
associated with DNNs' decision-making. Our analyses suggest that shallow layers
close to the input layer compress features into condensed space and that deep
layers close to the output layer expand feature space. We also found evidence
indicating that compressed features may underlie DNNs' vulnerabilities to
adversarial perturbations.
( 2
min )
In a high-dimensional regression framework, we study consequences of the
naive two-step procedure where first the dimension of the input variables is
reduced and second, the reduced input variables are used to predict the output
variable with kernel regression. In order to analyze the resulting regression
errors, a novel stability result for kernel regression with respect to the
Wasserstein distance is derived. This allows us to bound errors that occur when
perturbed input data is used to fit the regression function. We apply the
general stability result to principal component analysis (PCA). Exploiting
known estimates from the literature on both principal component analysis and
kernel regression, we deduce convergence rates for the two-step procedure. The
latter turns out to be particularly useful in a semi-supervised setting.
( 2
min )
Linear regression is one of the most fundamental linear algebra problems.
Given a dense matrix $A \in \mathbb{R}^{n \times d}$ and a vector $b$, the goal
is to find $x'$ such that
$ \| Ax' - b \|_2^2 \leq (1+\epsilon) \min_{x} \| A x - b \|_2^2 $. The best
classical algorithm takes $O(nd) + \mathrm{poly}(d/\epsilon)$ time [Clarkson
and Woodruff STOC 2013, Nelson and Nguyen FOCS 2013]. On the other hand,
quantum linear regression algorithms can achieve exponential quantum speedups,
as shown in [Wang Phys. Rev. A 96, 012335, Kerenidis and Prakash ITCS 2017,
Chakraborty, Gily{\'e}n and Jeffery ICALP 2019]. However, the running times of
these algorithms depend on some quantum linear algebra-related parameters, such
as $\kappa(A)$, the condition number of $A$. In this work, we develop a quantum
algorithm that runs in $\widetilde{O}(\epsilon^{-1}\sqrt{n}d^{1.5}) +
\mathrm{poly}(d/\epsilon)$ time. It provides a quadratic quantum speedup in $n$
over the classical lower bound without any dependence on data-dependent
parameters. In addition, we also show our result can be generalized to multiple
regression and ridge linear regression.
( 2
min )
Mini-EUSO is a wide-angle fluorescence telescope that registers ultraviolet
(UV) radiation in the nocturnal atmosphere of Earth from the International
Space Station. Meteors are among multiple phenomena that manifest themselves
not only in the visible range but also in the UV. We present two simple
artificial neural networks that allow for recognizing meteor signals in the
Mini-EUSO data with high accuracy in terms of a binary classification problem.
We expect that similar architectures can be effectively used for signal
recognition in other fluorescence telescopes, regardless of the nature of the
signal. Due to their simplicity, the networks can be implemented in onboard
electronics of future orbital or balloon experiments.
( 3
min )
This document describes an approach used in the Multi-Machine Disruption
Prediction Challenge for Fusion Energy by ITU, a data science competition which
ran from September to November 2023, on the online platform Zindi. The
competition involved data from three fusion devices - C-Mod, HL-2A, and J-TEXT
- with most of the training data coming from the last two, and the test data
coming from the first one. Each device has multiple diagnostics and signals,
and it turns out that a critical issue in this competition was to identify
which signals, and especially which features from those signals, were most
relevant to achieve accurate predictions. The approach described here is based
on extracting features from signals, and then applying logistic regression on
top of those features. Each signal is treated as a separate predictor and, in
the end, a combination of such predictors achieved the first place on the
leaderboard.
( 2
min )
On dedicated analog hardware, equilibrium propagation is an energy-efficient
alternative to backpropagation. In spite of its theoretical guarantees, its
application in the AI domain remains limited to the discriminative setting.
Meanwhile, despite its high computational demands, generative AI is on the
rise. In this paper, we demonstrate the application of Equilibrium Propagation
in training a variational autoencoder (VAE) for generative modeling. Leveraging
the symmetric nature of Hopfield networks, we propose using a single model to
serve as both the encoder and decoder which could effectively halve the
required chip size for VAE implementations, paving the way for more efficient
analog hardware configurations.
( 2
min )
Although gradient descent with momentum is widely used in modern deep
learning, a concrete understanding of its effects on the training trajectory
still remains elusive. In this work, we empirically show that momentum gradient
descent with a large learning rate and learning rate warmup displays large
catapults, driving the iterates towards flatter minima than those found by
gradient descent. We then provide empirical evidence and theoretical intuition
that the large catapult is caused by momentum "amplifying" the
self-stabilization effect (Damian et al., 2023).
( 2
min )
A key problem in off-policy Reinforcement Learning (RL) is the mismatch, or
distribution shift, between the dataset and the distribution over states and
actions visited by the learned policy. This problem is exacerbated in the fully
offline setting. The main approach to correct this shift has been through
importance sampling, which leads to high-variance gradients. Other approaches,
such as conservatism or behavior-regularization, regularize the policy at the
cost of performance. In this paper, we propose a new approach for stable
off-policy Q-Learning. Our method, Projected Off-Policy Q-Learning (POP-QL), is
a novel actor-critic algorithm that simultaneously reweights off-policy samples
and constrains the policy to prevent divergence and reduce value-approximation
error. In our experiments, POP-QL not only shows competitive performance on
standard benchmarks, but also out-performs competing methods in tasks where the
data-collection policy is significantly sub-optimal.
( 2
min )
Foundation models, specifically Large Language Models (LLM's), have lately
gained wide-spread attention and adoption. Reinforcement Learning with Human
Feedback (RLHF) involves training a reward model to capture desired behaviors,
which is then used to align an LLM. These reward models are additionally used
at inference-time to estimate how well LLM responses adhere to those desired
behaviors. However, there is little work measuring how robust these reward
models are to distribution shifts. In this work, we evaluate how reward model
performance - measured via accuracy and calibration (i.e. alignment between
accuracy and confidence) - is affected by distribution shift. We show novel
calibration patterns and accuracy drops due to OOD prompts and responses, and
that the reward model is more sensitive to shifts in responses than prompts.
Additionally, we adapt an OOD detection technique commonly used in
classification to the reward model setting in order to detect these
distribution shifts in prompts and responses.
( 2
min )
In this research, we developed a graph-based framework to represent various
aspects of optimal thermal management system design, with the aim of rapidly
and efficiently identifying optimal design candidates. Initially, the
graph-based framework is utilized to generate diverse thermal management system
architectures. The dynamics of these system architectures are modeled under
various loading conditions, and an open-loop optimal controller is employed to
determine each system's optimal performance. These modeled cases constitute the
dataset, with the corresponding optimal performance values serving as the
labels for the data. In the subsequent step, a Graph Neural Network (GNN) model
is trained on 30% of the labeled data to predict the systems' performance,
effectively addressing a regression problem. Utilizing this trained model, we
estimate the performance values for the remaining 70% of the data, which serves
as the test set. In the third step, the predicted performance values are
employed to rank the test data, facilitating prioritized evaluation of the
design scenarios. Specifically, a small subset of the test data with the
highest estimated ranks undergoes evaluation via the open-loop optimal control
solver. This targeted approach concentrates on evaluating higher-ranked designs
identified by the GNN, replacing the exhaustive search (enumeration-based) of
all design cases. The results demonstrate a significant average reduction of
over 92% in the number of system dynamic modeling and optimal control analyses
required to identify optimal design scenarios.
( 3
min )
Since no solutions have been proposed in Colombia that seek to reduce the
consumption of electricity at the residential level, this paper describes the
design and implementation of a simple prototype of a low-cost home energy
management system (HEMS). The objective of this plat-form is to monitor the
energy consumption of typical household devices so that users can access the
consumption of each device separately and then establish the strategy that
allows them to reduce energy consumption at home. In order to demonstrate that
our system is viable, the system has been evaluated by measuring weekly energy
consumption with the on-line and off-line HEMS using a test bench with typical
household devices in a Sincelejo typical household. The evaluation has shown
that with the installation of this HEMS, consumption is reduced by 27%. This
shows that it is possible to achieve a good reduction percentage with a
low-cost system.
( 2
min )
This paper investigates an approach to both speed up business decision-making
and lower the cost of learning through experimentation by factorizing business
policies and employing fractional factorial experimental designs for their
evaluation. We illustrate how this method integrates with advances in the
estimation of heterogeneous treatment effects, elaborating on its advantages
and foundational assumptions. We empirically demonstrate the implementation and
benefits of our approach and assess its validity in evaluating consumer
promotion policies at DoorDash, which is one of the largest delivery platforms
in the US. Our approach discovers a policy with 5% incremental profit at 67%
lower implementation cost.
( 2
min )
There is growing concern that the potential of black box AI may exacerbate
health-related disparities and biases such as gender and ethnicity in clinical
decision-making. Biased decisions can arise from data availability and
collection processes, as well as from the underlying confounding effects of the
protected attributes themselves. This work proposes a machine learning-based
orthogonal approach aiming to analyze and suppress the effect of the confounder
through discriminant dimensionality reduction and orthogonalization of the
protected attributes against the primary attribute information. By doing so,
the impact of the protected attributes on disease diagnosis can be realized,
undesirable feature correlations can be mitigated, and the model prediction
performance can be enhanced.
( 2
min )
With the rise of Large Language Models (LLMs), notably characterized by GPT
frameworks, there emerges a catalyst for novel healthcare applications. Earlier
iterations of chatbot caregivers, though existent, have yet to achieve a
dimension of human-like authenticity. This paper unveils `MemoryCompanion' a
pioneering digital health solution explicitly tailored for Alzheimer's disease
(AD) patients and their caregivers. Drawing upon the nuances of GPT technology
and prompt engineering, MemoryCompanion manifests a personalized caregiving
paradigm, fostering interactions via voice-cloning and talking-face mechanisms
that resonate with the familiarity of known companions. Using advanced
prompt-engineering, the system intricately adapts to each patient's distinct
profile, curating its content and communication style accordingly. This
approach strives to counteract prevalent issues of social isolation and
loneliness frequently observed in AD demographics. Our methodology, grounded in
its innovative design, addresses both the caregiving and technological
challenges intrinsic to this domain.
( 2
min )
In this work, we present a method to generate a configurational level
fingerprint for polymers using the Bead-Spring-Model. Unlike some of the
previous fingerprinting approaches that employ monomer-level information where
atomistic descriptors are computed using quantum chemistry calculations, this
approach incorporates configurational information from a coarse-grained model
of a long polymer chain. The proposed approach may be advantageous for the
study of behavior resulting from large molecular weights. To create this
fingerprint, we make use of two kinds of descriptors. First, we calculate
certain geometric descriptors like Re2, Rg2 etc. and label them as Calculated
Descriptors. Second, we generate a set of data-driven descriptors using an
unsupervised autoencoder model and call them Learnt Descriptors. Using a
combination of both of them, we are able to learn mappings from the structure
to various properties of the polymer chain by training ML models. We test our
fingerprint to predict the probability of occurrence of a configuration at
equilibrium, which is approximated by a simple linear relationship between the
instantaneous internal energy and equilibrium average internal energy.
( 2
min )
Through the advancement in natural language processing (NLP), specifically in
speech recognition, fully automated complex systems functioning on voice input
have started proliferating in areas such as home automation. These systems have
been termed Automatic Speech Recognition Systems (ASR). In this review paper,
we explore the feasibility of an end-to-end system providing speech and text
based natural language processing for job interview preparation as well as
recommendation of relevant job postings. We also explore existing
recommender-based systems and note their limitations. This literature review
would help us identify the approaches and limitations of the various similar
use-cases of NLP technology for our upcoming project.
( 2
min )
Amazon Web Services and NVIDIA will bring the latest generative AI technologies to enterprises worldwide. Combining AI and cloud computing, NVIDIA founder and CEO Jensen Huang joined AWS CEO Adam Selipsky Tuesday on stage at AWS re:Invent 2023 at the Venetian Expo Center in Las Vegas. Selipsky said he was “thrilled” to announce the expansion Read article >
( 6
min )
Researchers and developers at leading pharmaceutical and techbio companies can now easily deploy NVIDIA Clara software and services for accelerated healthcare through Amazon Web Services. Announced today at AWS re:Invent, the initiative gives healthcare and life sciences developers using AWS cloud resources the flexibility to integrate NVIDIA-accelerated offerings such as NVIDIA BioNeMo — a generative Read article >
( 6
min )
Developing more intelligent robots in the cloud is about to get a speed multiplier. NVIDIA Isaac Sim and NVIDIA L40S GPUs are coming to Amazon Web Services, enabling developers to build and deploy accelerated robotics applications in the cloud. Isaac Sim, an extensible simulator for AI-enabled robots, is built on the NVIDIA Omniverse development platform Read article >
( 6
min )
Everything about large language models is big — giant models train on massive datasets across thousands of NVIDIA GPUs. That can pose a lot of big challenges for companies pursuing generative AI. NVIDIA NeMo, a framework for building, customizing and running LLMs, helps overcome these challenges. A team of experienced scientists and developers at Amazon Read article >
( 5
min )
This week’s talented In the NVIDIA Studio artist, Nourhan Ismail, created a literal NVIDIA studio.
( 7
min )
The immediate and pressing need for ‘digitizing’ your supply-chain One may conclude: ‘Digitizing’ the supply-chain has become a survival necessity for companies to stay competitive. Apart from a substantial jump in the efficiency-effectiveness, the customer-experience, and upside to revenues, companies can expect a huge-huge cost-saving… A Look at the Future: Components of Data-driven (Digital) Supply-chain… Read More »Data-driven, AI-powered supply chain part 3: Imagining the Future – Supply chain 5.0
The post Data-driven, AI-powered supply chain part 3: Imagining the Future – Supply chain 5.0 appeared first on Data Science Central.
( 25
min )
The viability of the ‘Viable Vision’. I did hear about the Theory of Constraints (TOC) off and on through the late 90s, but I didn’t pay much attention until late 2001. One of the i2 consultants I met at their annual meet in Malaysia had one too many- and ended up lecturing me on how… Read More »Data-driven supply chain part 2: The theory of constraints & the concept of the information supply chain.
The post Data-driven supply chain part 2: The theory of constraints & the concept of the information supply chain. appeared first on Data Science Central.
( 28
min )
While the world is going wild over the potential benefits of generative AI, there’s little attention paid to the data deployed to build and operate these tools. Let’s look at a few examples to explore what’s involved in determining data use, and why this matters for end users as well as operators. Text-based generative AI… Read More »Here’s How Much Data Gets Used By Generative AI Tools For Each Request
The post Here’s How Much Data Gets Used By Generative AI Tools For Each Request appeared first on Data Science Central.
( 21
min )
Earlier in the fall, Charles Hoffman joined our non-profit Dataworthy Collective (DC) that focuses on best practices in trusted knowledge graph development. Hoffman is a CPA, consultant and former PwC auditor who works with clients who use the Extensible Business Reporting Language (XBRL). For those who don’t know the history of standard digital business reporting,… Read More »Trusted, automated data sharing across spreadsheets and other documents
The post Trusted, automated data sharing across spreadsheets and other documents appeared first on Data Science Central.
( 20
min )
Learning unsupervised world models for autonomous driving has the potential
to improve the reasoning capabilities of today's systems dramatically. However,
most work neglects the physical attributes of the world and focuses on sensor
data alone. We propose MUVO, a MUltimodal World Model with Geometric VOxel
Representations to address this challenge. We utilize raw camera and lidar data
to learn a sensor-agnostic geometric representation of the world, which can
directly be used by downstream tasks, such as planning. We demonstrate
multimodal future predictions and show that our geometric representation
improves the prediction quality of both camera images and lidar point clouds.
( 2
min )
In echocardiographic view classification, accurately detecting
out-of-distribution (OOD) data is essential but challenging, especially given
the subtle differences between in-distribution and OOD data. While conventional
OOD detection methods, such as Mahalanobis distance (MD) are effective in
far-OOD scenarios with clear distinctions between distributions, they struggle
to discern the less obvious variations characteristic of echocardiographic
data. In this study, we introduce a novel use of label smoothing to enhance
semantic feature representation in echocardiographic images, demonstrating that
these enriched semantic features are key for significantly improving near-OOD
instance detection. By combining label smoothing with MD-based OOD detection,
we establish a new benchmark for accuracy in echocardiographic OOD detection.
( 2
min )
Tabular data is hard to acquire and is subject to missing values. This paper
proposes a novel approach to generate and impute mixed-type (continuous and
categorical) tabular data using score-based diffusion and conditional flow
matching. Contrary to previous work that relies on neural networks to learn the
score function or the vector field, we instead rely on XGBoost, a popular
Gradient-Boosted Tree (GBT) method. We empirically show on 27 different
datasets that our approach i) generates highly realistic synthetic data when
the training dataset is either clean or tainted by missing data and ii)
generates diverse plausible data imputations. Furthermore, our method
outperforms deep-learning generation methods on data generation and is
competitive on data imputation. Finally, it can be trained in parallel using
CPUs without the need for a GPU. To make it easily accessible, we release our
code through a Python library and an R package.
( 2
min )
A common forecasting setting in real world applications considers a set of
possibly heterogeneous time series of the same domain. Due to different
properties of each time series such as length, obtaining forecasts for each
individual time series in a straight-forward way is challenging. This paper
proposes a general framework utilizing a similarity measure in Dynamic Time
Warping to find similar time series to build neighborhoods in a k-Nearest
Neighbor fashion, and improve forecasts of possibly simple models by averaging.
Several ways of performing the averaging are suggested, and theoretical
arguments underline the usefulness of averaging for forecasting. Additionally,
diagnostics tools are proposed allowing a deep understanding of the procedure.
( 2
min )
Recent results show that estimates defined by over-parametrized deep neural
networks learned by applying gradient descent to a regularized empirical $L_2$
risk are universally consistent and achieve good rates of convergence. In this
paper, we show that the regularization term is not necessary to obtain similar
results. In the case of a suitably chosen initialization of the network, a
suitable number of gradient descent steps, and a suitable step size we show
that an estimate without a regularization term is universally consistent for
bounded predictor variables. Additionally, we show that if the regression
function is H\"older smooth with H\"older exponent $1/2 \leq p \leq 1$, the
$L_2$ error converges to zero with a convergence rate of approximately
$n^{-1/(1+d)}$. Furthermore, in case of an interaction model, where the
regression function consists of a sum of H\"older smooth functions with $d^*$
components, a rate of convergence is derived which does not depend on the input
dimension $d$.
( 2
min )
Amazon Elastic Compute Cloud (Amazon EC2) accelerated computing portfolio offers the broadest choice of accelerators to power your artificial intelligence (AI), machine learning (ML), graphics, and high performance computing (HPC) workloads. We are excited to announce the expansion of this portfolio with three new instances featuring the latest NVIDIA GPUs: Amazon EC2 P5e instances powered […]
( 4
min )
Today, Amazon SageMaker launches a new version (0.25.0) of Large Model Inference (LMI) Deep Learning Containers (DLCs) and adds support for NVIDIA’s TensorRT-LLM Library. With these upgrades, you can effortlessly access state-of-the-art tooling to optimize large language models (LLMs) on SageMaker and achieve price-performance benefits – Amazon SageMaker LMI TensorRT-LLM DLC reduces latency by 33% […]
( 9
min )
Generative artificial intelligence (generative AI) models have demonstrated impressive capabilities in generating high-quality text, images, and other content. However, these models require massive amounts of clean, structured training data to reach their full potential. Most real-world data exists in unstructured formats like PDFs, which requires preprocessing before it can be used effectively. According to IDC, […]
( 10
min )
This post is co-authored by Daryl Martis, Director of Product, Salesforce Einstein AI. This is the third post in a series discussing the integration of Salesforce Data Cloud and Amazon SageMaker. In Part 1 and Part 2, we show how the Salesforce Data Cloud and Einstein Studio integration with SageMaker allows businesses to access their […]
( 12
min )
Artificial intelligence (AI) continues to transform how we do business and serve our customers. AWS offers a range of pre-trained AI services that provide ready-to-use intelligence for your applications. In this post, we explore the new AI service capabilities and how they are enhanced using foundation models (FMs). We focus on the following major updates […]
( 7
min )
In this post, we talk about how generative AI is changing the conversational AI industry by providing new customer and bot builder experiences, and the new features in Amazon Lex that take advantage of these advances. As the demand for conversational AI continues to grow, developers are seeking ways to enhance their chatbots with human-like […]
( 7
min )
Human Guided Exploration (HuGE) enables AI agents to learn quickly with some help from humans, even if the humans make mistakes.
( 11
min )
Amazon Transcribe is a fully managed automatic speech recognition (ASR) service that makes it straightforward for you to add speech-to-text capabilities to your applications. Today, we are happy to announce a next-generation multi-billion parameter speech foundation model-powered system that expands automatic speech recognition to over 100 languages. In this post, we discuss some of the […]
( 7
min )
Today, we are excited to announce three launches that will help you enhance personalized customer experiences using Amazon Personalize and generative AI. Whether you’re looking for a managed solution or build your own, you can use these new capabilities to power your journey. Amazon Personalize is a fully managed machine learning (ML) service that makes […]
( 8
min )
Amazon Personalize is excited to announce the new Next Best Action (aws-next-best-action) recipe to help you determine the best actions to suggest to your individual users that will enable you to increase brand loyalty and conversion. Amazon Personalize is a fully managed machine learning (ML) service that makes it effortless for developers to deliver highly […]
( 8
min )
NVIDIA today launched a cloud service for medical imaging AI to further streamline and accelerate the creation of ground-truth data and training of specialized AI models through fully managed, cloud-based application programming interfaces. NVIDIA MONAI cloud APIs — announced at the annual meeting of RSNA, the Radiological Society of North America, taking place this week Read article >
( 7
min )
This post is co-written with Marc Neumann, Amor Steinberg and Marinus Krommenhoek from BMW Group. The BMW Group – headquartered in Munich, Germany – is driven by 149,000 employees worldwide and manufactures in over 30 production and assembly facilities across 15 countries. Today, the BMW Group is the world’s leading manufacturer of premium automobiles and […]
( 11
min )
In today’s ever-evolving world of ecommerce, the influence of a compelling product description cannot be overstated. It can be the decisive factor that turns a potential visitor into a paying customer or sends them clicking off to a competitor’s site. The manual creation of these descriptions across a vast array of products is a labor-intensive […]
( 9
min )
Amazon SageMaker Canvas is a rich, no-code Machine Learning (ML) and Generative AI workspace that has allowed customers all over the world to more easily adopt ML technologies to solve old and new challenges thanks to its visual, no-code interface. It does so by covering the ML workflow end-to-end: whether you’re looking for powerful data […]
( 9
min )
This post was co-written with Greg Benson, Chief Scientist; Aaron Kesler, Sr. Product Manager; and Rich Dill, Enterprise Solutions Architect from SnapLogic. Many customers are building generative AI apps on Amazon Bedrock and Amazon CodeWhisperer to create code artifacts based on natural language. This use case highlights how large language models (LLMs) are able to […]
( 17
min )
As a surrogate for computationally intensive meso-scale simulation of woven
composites, this article presents Recurrent Neural Network (RNN) models.
Leveraging the power of transfer learning, the initialization challenges and
sparse data issues inherent in cyclic shear strain loads are addressed in the
RNN models. A mean-field model generates a comprehensive data set representing
elasto-plastic behavior. In simulations, arbitrary six-dimensional strain
histories are used to predict stresses under random walking as the source task
and cyclic loading conditions as the target task. Incorporating sub-scale
properties enhances RNN versatility. In order to achieve accurate predictions,
the model uses a grid search method to tune network architecture and
hyper-parameter configurations. The results of this study demonstrate that
transfer learning can be used to effectively adapt the RNN to varying strain
conditions, which establishes its potential as a useful tool for modeling
path-dependent responses in woven composites.
( 2
min )
In safety-critical domains such as autonomous driving and medical diagnosis,
the reliability of machine learning models is crucial. One significant
challenge to reliability is concept drift, which can cause model deterioration
over time. Traditionally, drift detectors rely on true labels, which are often
scarce and costly. This study conducts a comprehensive empirical evaluation of
using uncertainty values as substitutes for error rates in detecting drifts,
aiming to alleviate the reliance on labeled post-deployment data. We examine
five uncertainty estimation methods in conjunction with the ADWIN detector
across seven real-world datasets. Our results reveal that while the SWAG method
exhibits superior calibration, the overall accuracy in detecting drifts is not
notably impacted by the choice of uncertainty estimation method, with even the
most basic method demonstrating competitive performance. These findings offer
valuable insights into the practical applicability of uncertainty-based drift
detection in real-world, safety-critical applications.
( 2
min )
This paper introduces a new model to generate rhythmically relevant
non-verbal facial behaviors for virtual agents while they speak. The model
demonstrates perceived performance comparable to behaviors directly extracted
from the data and replayed on a virtual agent, in terms of synchronization with
speech and believability. Interestingly, we found that training the model with
two different sets of data, instead of one, did not necessarily improve its
performance. The expressiveness of the people in the dataset and the shooting
conditions are key elements. We also show that employing an adversarial model,
in which fabricated fake examples are introduced during the training phase,
increases the perception of synchronization with speech. A collection of videos
demonstrating the results and code can be accessed at:
https://github.com/aldelb/non_verbal_facial_animation.
( 2
min )
Due to its predominantly asymptomatic or mildly symptomatic progression, lung
cancer is often diagnosed in advanced stages, resulting in poorer survival
rates for patients. As with other cancers, early detection significantly
improves the chances of successful treatment. Early diagnosis can be
facilitated through screening programs designed to detect lung tissue tumors
when they are still small, typically around 3mm in size. However, the analysis
of extensive screening program data is hampered by limited access to medical
experts. In this study, we developed a procedure for identifying potential
malignant neoplastic lesions within lung parenchyma. The system leverages
machine learning (ML) techniques applied to two types of measurements: low-dose
Computed Tomography-based radiomics and metabolomics. Using data from two
Polish screening programs, two ML algorithms were tested, along with various
integration methods, to create a final model that combines both modalities to
support lung cancer screening.
( 2
min )
This manuscript presents an advanced framework for Bayesian learning by
incorporating action and state-dependent signal variances into decision-making
models. This framework is pivotal in understanding complex data-feedback loops
and decision-making processes in various economic systems. Through a series of
examples, we demonstrate the versatility of this approach in different
contexts, ranging from simple Bayesian updating in stable environments to
complex models involving social learning and state-dependent uncertainties. The
paper uniquely contributes to the understanding of the nuanced interplay
between data, actions, outcomes, and the inherent uncertainty in economic
models.
( 2
min )
Convolutional Neural Networks (CNNs) have greatly influenced the field of
Embedded Vision and Edge Artificial Intelligence (AI), enabling powerful
machine learning capabilities on resource-constrained devices. This article
explores the relationship between CNN compute requirements and memory bandwidth
in the context of Edge AI. We delve into the historical progression of CNN
architectures, from the early pioneering models to the current state-of-the-art
designs, highlighting the advancements in compute-intensive operations. We
examine the impact of increasing model complexity on both computational
requirements and memory access patterns. The paper presents a comparison
analysis of the evolving trade-off between compute demands and memory bandwidth
requirements in CNNs. This analysis provides insights into designing efficient
architectures and potential hardware accelerators in enhancing CNN performance
on edge devices.
( 2
min )
Bowers and colleagues argue that DNNs are poor models of biological vision
because they often learn to rival human accuracy by relying on strategies that
differ markedly from those of humans. We show that this problem is worsening as
DNNs are becoming larger-scale and increasingly more accurate, and prescribe
methods for building DNNs that can reliably model biological vision.
( 2
min )
Robotic capacities in object manipulation are incomparable to those of
humans. Besides years of learning, humans rely heavily on the richness of
information from physical interaction with the environment. In particular,
tactile sensing is crucial in providing such rich feedback. Despite its
potential contributions to robotic manipulation, tactile sensing is less
exploited; mainly due to the complexity of the time series provided by tactile
sensors. In this work, we propose a method for assessing grasp stability using
tactile sensing. More specifically, we propose a methodology to extract
task-relevant features and design efficient classifiers to detect object
slippage with respect to individual fingertips. We compare two classification
models: support vector machine and logistic regression. We use highly sensitive
Uskin tactile sensors mounted on an Allegro hand to test and validate our
method. Our results demonstrate that the proposed method is effective in
slippage detection in an online fashion.
( 2
min )
Learning and forecasting stochastic time series is essential in various
scientific fields. However, despite the proposals of nonlinear filters and
deep-learning methods, it remains challenging to capture nonlinear dynamics
from a few noisy samples and predict future trajectories with uncertainty
estimates while maintaining computational efficiency. Here, we propose a fast
algorithm to learn and forecast nonlinear dynamics from noisy time series data.
A key feature of the proposed model is kernel functions applied to projected
lines, enabling fast and efficient capture of nonlinearities in the latent
dynamics. Through empirical case studies and benchmarking, the model
demonstrates its effectiveness in learning and forecasting complex nonlinear
dynamics, offering a valuable tool for researchers and practitioners in time
series analysis.
( 2
min )
Working with multiple variables they usually contain difficult to control
complex dependencies. This article proposes extraction of their individual
information, e.g. $\overline{X|Y}$ as random variable containing information
from $X$, but with removed information about $Y$, by using $(x,y)
\leftrightarrow (\bar{x}=\textrm{CDF}_{X|Y=y}(x),y)$ reversible normalization.
One application can be decoupling of individual information of variables:
reversibly transform $(X_1,\ldots,X_n)\leftrightarrow(\tilde{X}_1,\ldots
\tilde{X}_n)$ together containing the same information, but being independent:
$\forall_{i\neq j} \tilde{X}_i\perp \tilde{X}_j, \tilde{X}_i\perp X_j$. It
requires detailed models of complex conditional probability distributions - it
is generally a difficult task, but here can be done through multiple dependency
reducing iterations, using imperfect methods (here HCR: Hierarchical
Correlation Reconstruction). It could be also used for direct mutual
information - evaluating direct information transfer: without use of
intermediate variables. For causality direction there is discussed
multi-feature Granger causality, e.g. to trace various types of individual
information transfers between such decoupled variables, including propagation
time (delay).
( 2
min )
Multi-objective optimization (MOO) aims to optimize multiple, possibly
conflicting objectives with widespread applications. We introduce a novel
interacting particle method for MOO inspired by molecular dynamics simulations.
Our approach combines overdamped Langevin and birth-death dynamics,
incorporating a "dominance potential" to steer particles toward global Pareto
optimality. In contrast to previous methods, our method is able to relocate
dominated particles, making it particularly adept at managing Pareto fronts of
complicated geometries. Our method is also theoretically grounded as a
Wasserstein-Fisher-Rao gradient flow with convergence guarantees. Extensive
experiments confirm that our approach outperforms state-of-the-art methods on
challenging synthetic and real-world datasets.
( 2
min )
By analyzing bacterial data, researchers have discovered thousands of rare new CRISPR systems that have a range of functions and could enable gene editing, diagnostics, and more.
( 10
min )
GeForce NOW is bringing 18 new games to the cloud this week, part of a gratitude-filled GFN Thursday. A collaboration between Chromebook Plus, CD PROJEKT RED and GeForce NOW brought an immersive 3D activation to Times Square over the weekend, containing a hidden Easter egg for Cyberpunk 2077 players. Plus, this holiday season, give the Read article >
( 5
min )
In this article, we consider the problem of approximating a finite set of
data (usually huge in applications) by invariant subspaces generated through a
small set of smooth functions. The invariance is either by translations under a
full-rank lattice or through the action of crystallographic groups. Smoothness
is ensured by stipulating that the generators belong to a Paley-Wiener space,
that is selected in an optimal way based on the characteristics of the given
data. To complete our investigation, we analyze the fundamental role played by
the lattice in the process of approximation.
( 2
min )
We study the problem of solving strongly convex and smooth unconstrained
optimization problems using stochastic first-order algorithms. We devise a
novel algorithm, referred to as \emph{Recursive One-Over-T SGD} (\ROOTSGD),
based on an easily implementable, recursive averaging of past stochastic
gradients. We prove that it simultaneously achieves state-of-the-art
performance in both a finite-sample, nonasymptotic sense and an asymptotic
sense. On the nonasymptotic side, we prove risk bounds on the last iterate of
\ROOTSGD with leading-order terms that match the optimal statistical risk with
a unity pre-factor, along with a higher-order term that scales at the sharp
rate of $O(n^{-3/2})$ under the Lipschitz condition on the Hessian matrix. On
the asymptotic side, we show that when a mild, one-point Hessian continuity
condition is imposed, the rescaled last iterate of (multi-epoch) \ROOTSGD
converges asymptotically to a Gaussian limit with the Cram\'{e}r-Rao optimal
asymptotic covariance, for a broad range of step-size choices.
( 2
min )
Machine Learning (ML) and Algorithmic Information Theory (AIT) look at
Complexity from different points of view. We explore the interface between AIT
and Kernel Methods (that are prevalent in ML) by adopting an AIT perspective on
the problem of learning kernels from data, in kernel ridge regression, through
the method of Sparse Kernel Flows. In particular, by looking at the differences
and commonalities between Minimal Description Length (MDL) and Regularization
in Machine Learning (RML), we prove that the method of Sparse Kernel Flows is
the natural approach to adopt to learn kernels from data. This paper shows that
it is not necessary to use the statistical route to derive Sparse Kernel Flows
and that one can directly work with code-lengths and complexities that are
concepts that show up in AIT.
( 2
min )
We introduce a new Langevin dynamics based algorithm, called
e-TH$\varepsilon$O POULA, to solve optimization problems with discontinuous
stochastic gradients which naturally appear in real-world applications such as
quantile estimation, vector quantization, CVaR minimization, and regularized
optimization problems involving ReLU neural networks. We demonstrate both
theoretically and numerically the applicability of the e-TH$\varepsilon$O POULA
algorithm. More precisely, under the conditions that the stochastic gradient is
locally Lipschitz in average and satisfies a certain convexity at infinity
condition, we establish non-asymptotic error bounds for e-TH$\varepsilon$O
POULA in Wasserstein distances and provide a non-asymptotic estimate for the
expected excess risk, which can be controlled to be arbitrarily small. Three
key applications in finance and insurance are provided, namely, multi-period
portfolio optimization, transfer learning in multi-period portfolio
optimization, and insurance claim prediction, which involve neural networks
with (Leaky)-ReLU activation functions. Numerical experiments conducted using
real-world datasets illustrate the superior empirical performance of
e-TH$\varepsilon$O POULA compared to SGLD, TUSLA, ADAM, and AMSGrad in terms of
model accuracy.
( 2
min )
Networks are ubiquitous in many real-world applications (e.g., social
networks encoding trust/distrust relationships, correlation networks arising
from time series data). While many networks are signed or directed, or both,
there is a lack of unified software packages on graph neural networks (GNNs)
specially designed for signed and directed networks. In this paper, we present
PyTorch Geometric Signed Directed (PyGSD), a software package which fills this
gap. Along the way, we evaluate the implemented methods with experiments with a
view to providing insights into which method to choose for a given task. The
deep learning framework consists of easy-to-use GNN models, synthetic and
real-world data, as well as task-specific evaluation metrics and loss functions
for signed and directed networks. As an extension library for PyG, our proposed
software is maintained with open-source releases, detailed documentation,
continuous integration, unit tests and code coverage checks. The GitHub
repository of the library is
https://github.com/SherylHYX/pytorch_geometric_signed_directed.
( 3
min )
Sequential neural posterior estimation (SNPE) techniques have been recently
proposed for dealing with simulation-based models with intractable likelihoods.
Unlike approximate Bayesian computation, SNPE techniques learn the posterior
from sequential simulation using neural network-based conditional density
estimators. This paper reclaims SNPE-B proposed by Lueckmann et al. (2017),
which suffers from inefficiency and slow inference due to inefficient
utilization of simulated data and high variance of parameter updates. To
address these issues, we firstly introduce a concentrated loss function based
on an adaptive calibration kernel that reweights the simulated data
appropriately to improve the data efficiency. Moreover, we provide a
theoretical analysis of the variance of associated Monte Carlo estimators.
Based on this analysis, we then propose several variance reduction techniques
to further accelerate the process of learning. Numerical experiments
demonstrate that our method outperforms the original method together with other
existing competitors on certain tasks.
( 2
min )
In real-world reinforcement learning problems, the state information is often
only partially observable, which breaks the basic assumption in Markov decision
processes, and thus, leads to inferior performances. Partially Observable
Markov Decision Processes have been introduced to explicitly take the issue
into account for learning, exploration, and planning, but presenting
significant computational and statistical challenges. To address these
difficulties, we exploit the representation view, which leads to a coherent
design framework for a practically tractable reinforcement learning algorithm
upon partial observations. We provide a theoretical analysis for justifying the
statistical efficiency of the proposed algorithm. We also empirically
demonstrate the proposed algorithm can surpass state-of-the-art performance
with partial observations across various benchmarks, therefore, pushing
reliable reinforcement learning towards more practical applications.
( 2
min )
This is a guest post by A.K Roy from Qualcomm AI. Amazon Elastic Compute Cloud (Amazon EC2) DL2q instances, powered by Qualcomm AI 100 Standard accelerators, can be used to cost-efficiently deploy deep learning (DL) workloads in the cloud. They can also be used to develop and validate performance and accuracy of DL workloads that […]
( 9
min )
The financial service (FinServ) industry has unique generative AI requirements related to domain-specific data, data security, regulatory controls, and industry compliance standards. In addition, customers are looking for choices to select the most performant and cost-effective machine learning (ML) model and the ability to perform necessary customization (fine-tuning) to fit their business use cases. Amazon […]
( 11
min )
The IDP Well-Architected Lens is intended for all AWS customers who use AWS to run intelligent document processing (IDP) solutions and are searching for guidance on how to build secure, efficient, and reliable IDP solutions on AWS. Building a production-ready solution in the cloud involves a series of trade-offs between resources, time, customer expectation, and […]
( 14
min )
Building a production-ready solution in AWS involves a series of trade-offs between resources, time, customer expectation, and business outcome. The AWS Well-Architected Framework helps you understand the benefits and risks of decisions you make while building workloads on AWS. By using the Framework, you will learn current operational and architectural recommendations for designing and operating […]
( 11
min )
The IDP Well-Architected Custom Lens is intended for all AWS customers who use AWS to run intelligent document processing (IDP) solutions and are searching for guidance on how to build a secure, efficient, and reliable IDP solution on AWS. Building a production-ready solution in the cloud involves a series of trade-offs between resources, time, customer […]
( 13
min )
When a customer has a production-ready intelligent document processing (IDP) workload, we often receive requests for a Well-Architected review. To build an enterprise solution, developer resources, cost, time and user-experience have to be balanced to achieve the desired business outcome. The AWS Well-Architected Framework provides a systematic way for organizations to learn operational and architectural […]
( 10
min )
Building a production-ready solution in the cloud involves a series of trade-off between resources, time, customer expectation, and business outcome. The AWS Well-Architected Framework helps you understand the benefits and risks of decisions you make while building workloads on AWS. An intelligent document processing (IDP) project usually combines optical character recognition (OCR) and natural language […]
( 13
min )
An intelligent document processing (IDP) project typically combines optical character recognition (OCR) and natural language processing (NLP) to automatically read and understand documents. Customers across all industries run IDP workloads on AWS to deliver business value by automating use cases such as KYC forms, tax documents, invoices, insurance claims, delivery reports, inventory reports, and more. […]
( 11
min )
For decades, Amazon has pioneered and innovated machine learning (ML), bringing delightful experiences to its customers. From the earliest days, Amazon has used ML for various use cases such as book recommendations, search, and fraud detection. Similar to the rest of the industry, the advancements of accelerated hardware have allowed Amazon teams to pursue model […]
( 11
min )
Today, geospatial workflows typically consist of loading data, transforming it, and then producing visual insights like maps, text, or charts. Generative AI can automate these tasks through autonomous agents. In this post, we discuss how to use foundation models from Amazon Bedrock to power agents to complete geospatial tasks. These agents can perform various tasks […]
( 11
min )
A new deep-learning compiler for dynamic sparsity; Tongue Tap could make tongue gestures viable for VR/AR headsets; Ranking LLM-Generated Loop Invariants for Program Verification; Assessing the limits of zero-shot foundation models in single-cell biology.
The post Research Focus: Week of November 22, 2023 appeared first on Microsoft Research.
( 10
min )
A calendar packed with meetings, calls and lab visits may sound like a typical workday for many — but for Luca Lofranco, whose greatest wish was to experience what it’s like to work at NVIDIA, it was a dream come true. Eighteen-year-old Lofranco recently traveled from his hometown near Toronto, Canada, to spend the day Read article >
( 6
min )
Talk about going after low-hanging fruit. Afresh is an AI startup that helps grocery stores and retailers reduce food waste by making supply chains more efficient. In the latest episode of NVIDIA’s AI Podcast, host Noah Kravitz spoke with the company’s cofounder and president, Nathan Fenner, about its mission, offerings and the greater challenge of Read article >
( 5
min )
AI-based medical technologies, including wearables, telemedicine, LLMs, and
digital care twins, significantly impact healthcare. Ensuring AI results are
accurate and interpretable is crucial, especially for clinicians. This paper
reviews processes and challenges of interpretable ML (IML) and explainable AI
(XAI) in healthcare. Objectives include reviewing XAI processes, methods,
applications, and challenges, with a focus on quality control. The IML process
is classified into data pre-processing interpretability, interpretable
modeling, and post-processing interpretability. The paper aims to establish the
importance of robust interpretability in healthcare through experimental
results, providing insights for creating communicable clinician-AI tools.
Research questions, eligibility criteria, and goals were identified following
PRISMA and PICO methods. PubMed, Scopus, and Web of Science were systematically
searched using specific strings. The survey introduces a step-by-step roadmap
for implementing XAI in clinical applications, addressing existing gaps and
acknowledging XAI model limitations.
( 2
min )
This study introduces a novel forecasting strategy that leverages the power
of fractional differencing (FD) to capture both short- and long-term
dependencies in time series data. Unlike traditional integer differencing
methods, FD preserves memory in series while stabilizing it for modeling
purposes. By applying FD to financial data from the SPY index and incorporating
sentiment analysis from news reports, this empirical analysis explores the
effectiveness of FD in conjunction with binary classification of target
variables. Supervised classification algorithms were employed to validate the
performance of FD series. The results demonstrate the superiority of FD over
integer differencing, as confirmed by Receiver Operating Characteristic/Area
Under the Curve (ROCAUC) and Mathews Correlation Coefficient (MCC) evaluations.
( 2
min )
The infinitely wide neural network has been proven a useful and manageable
mathematical model that enables the understanding of many phenomena appearing
in deep learning. One example is the convergence of random deep networks to
Gaussian processes that allows a rigorous analysis of the way the choice of
activation function and network weights impacts the training dynamics. In this
paper, we extend the seminal proof of Matthews et al. (2018) to a larger class
of initial weight distributions (which we call PSEUDO-IID), including the
established cases of IID and orthogonal weights, as well as the emerging
low-rank and structured sparse settings celebrated for their computational
speed-up benefits. We show that fully-connected and convolutional networks
initialized with PSEUDO-IID distributions are all effectively equivalent up to
their variance. Using our results, one can identify the Edge-of-Chaos for a
broader class of neural networks and tune them at criticality in order to
enhance their training.
( 2
min )
Metarounding is an approach to convert an approximation algorithm for linear
optimization over some combinatorial classes to an online linear optimization
algorithm for the same class. We propose a new metarounding algorithm under a
natural assumption that a relax-based approximation algorithm exists for the
combinatorial class. Our algorithm is much more efficient in both theoretical
and practical aspects.
( 2
min )
Text-based game environments are challenging because agents must deal with
long sequences of text, execute compositional actions using text and learn from
sparse rewards. We address these challenges by proposing Language Decision
Transformers (LDTs), a framework that is based on transformer language models
and decision transformers (DTs). Our LDTs extend DTs with 3 components: (1)
exponential tilt to guide the agent towards high obtainable goals, (2) novel
goal conditioning methods yielding better results than the traditional
return-to-go (sum of all future rewards), and (3) a model of future
observations that improves agent performance. LDTs are the first to address
offline RL with DTs on these challenging games. Our experiments show that LDTs
achieve the highest scores among many different types of agents on some of the
most challenging Jericho games, such as Enchanter.
( 2
min )
There is no convincing evidence that backpropagation is a biologically
plausible mechanism, and further studies of alternative learning methods are
needed. A novel online clustering algorithm is presented that can produce
arbitrary shaped clusters from inputs in an unsupervised manner, and requires
no prior knowledge of the number of clusters in the input data. This is
achieved by finding correlated outputs from functions that capture commonly
occurring input patterns. The algorithm can be deemed more biologically
plausible than model optimization through backpropagation, although practical
applicability may require additional research. However, the method yields
satisfactory results on several toy datasets on a noteworthy range of
hyperparameters.
( 2
min )
Successful deployment of multi-agent reinforcement learning often requires
agents to adapt their behaviour. In this work, we discuss the problem of
teamwork adaptation in which a team of agents needs to adapt their policies to
solve novel tasks with limited fine-tuning. Motivated by the intuition that
agents need to be able to identify and distinguish tasks in order to adapt
their behaviour to the current task, we propose to learn multi-agent task
embeddings (MATE). These task embeddings are trained using an encoder-decoder
architecture optimised for reconstruction of the transition and reward
functions which uniquely identify tasks. We show that a team of agents is able
to adapt to novel tasks when provided with task embeddings. We propose three
MATE training paradigms: independent MATE, centralised MATE, and mixed MATE
which vary in the information used for the task encoding. We show that the
embeddings learned by MATE identify tasks and provide useful information which
agents leverage during adaptation to novel tasks.
( 2
min )
In this paper, we introduce a novel and computationally efficient method for
vertex embedding, community detection, and community size determination. Our
approach leverages a normalized one-hot graph encoder and a rank-based cluster
size measure. Through extensive simulations, we demonstrate the excellent
numerical performance of our proposed graph encoder ensemble algorithm.
( 2
min )
In the present work, we introduce a novel approach to enhance the precision
of reduced order models by exploiting a multi-fidelity perspective and
DeepONets. Reduced models provide a real-time numerical approximation by
simplifying the original model. The error introduced by the such operation is
usually neglected and sacrificed in order to reach a fast computation. We
propose to couple the model reduction to a machine learning residual learning,
such that the above-mentioned error can be learned by a neural network and
inferred for new predictions. We emphasize that the framework maximizes the
exploitation of high-fidelity information, using it for building the reduced
order model and for learning the residual. In this work, we explore the
integration of proper orthogonal decomposition (POD), and gappy POD for sensors
data, with the recent DeepONet architecture. Numerical investigations for a
parametric benchmark function and a nonlinear parametric Navier-Stokes problem
are presented.
( 2
min )
Carefully standardized facial images of 591 participants were taken in the
laboratory, while controlling for self-presentation, facial expression, head
orientation, and image properties. They were presented to human raters and a
facial recognition algorithm: both humans (r=.21) and the algorithm (r=.22)
could predict participants' scores on a political orientation scale (Cronbach's
alpha=.94) decorrelated with age, gender, and ethnicity. These effects are on
par with how well job interviews predict job success, or alcohol drives
aggressiveness. Algorithm's predictive accuracy was even higher (r=.31) when it
leveraged information on participants' age, gender, and ethnicity. Moreover,
the associations between facial appearance and political orientation seem to
generalize beyond our sample: The predictive model derived from standardized
images (while controlling for age, gender, and ethnicity) could predict
political orientation (r=.13) from naturalistic images of 3,401 politicians
from the U.S., UK, and Canada. The analysis of facial features associated with
political orientation revealed that conservatives tended to have larger lower
faces. The predictability of political orientation from standardized images has
critical implications for privacy, the regulation of facial recognition
technology, and understanding the origins and consequences of political
orientation.
( 3
min )
The rapid mutation of the influenza virus threatens public health.
Reassortment among viruses with different hosts can lead to a fatal pandemic.
However, it is difficult to detect the original host of the virus during or
after an outbreak as influenza viruses can circulate between different species.
Therefore, early and rapid detection of the viral host would help reduce the
further spread of the virus. We use various machine learning models with
features derived from the position-specific scoring matrix (PSSM) and features
learned from word embedding and word encoding to infer the origin host of
viruses. The results show that the performance of the PSSM-based model reaches
the MCC around 95%, and the F1 around 96%. The MCC obtained using the model
with word embedding is around 96%, and the F1 is around 97%.
( 2
min )
Modern time series classifiers display impressive predictive capabilities,
yet their decision-making processes mostly remain black boxes to the user. At
the same time, model-agnostic explainers, such as the recently proposed SHAP,
promise to make the predictions of machine learning models interpretable,
provided there are well-designed domain mappings. We bring both worlds together
in our timeXplain framework, extending the reach of explainable artificial
intelligence to time series classification and value prediction. We present
novel domain mappings for the time domain, frequency domain, and time series
statistics and analyze their explicative power as well as their limits. We
employ a novel evaluation metric to experimentally compare timeXplain to
several model-specific explanation approaches for state-of-the-art time series
classifiers.
( 2
min )
In this paper, we explore the structure of the penultimate Gram matrix in
deep neural networks, which contains the pairwise inner products of outputs
corresponding to a batch of inputs. In several architectures it has been
observed that this Gram matrix becomes degenerate with depth at initialization,
which dramatically slows training. Normalization layers, such as batch or layer
normalization, play a pivotal role in preventing the rank collapse issue.
Despite promising advances, the existing theoretical results do not extend to
layer normalization, which is widely used in transformers, and can not
quantitatively characterize the role of non-linear activations. To bridge this
gap, we prove that layer normalization, in conjunction with activation layers,
biases the Gram matrix of a multilayer perceptron towards the identity matrix
at an exponential rate with depth at initialization. We quantify this rate
using the Hermite expansion of the activation function.
( 2
min )
Despite the recent advancements in offline reinforcement learning via
supervised learning (RvS) and the success of the decision transformer (DT)
architecture in various domains, DTs have fallen short in several challenging
benchmarks. The root cause of this underperformance lies in their inability to
seamlessly connect segments of suboptimal trajectories. To overcome this
limitation, we present a novel approach to enhance RvS methods by integrating
intermediate targets. We introduce the Waypoint Transformer (WT), using an
architecture that builds upon the DT framework and conditioned on
automatically-generated waypoints. The results show a significant increase in
the final return compared to existing RvS methods, with performance on par or
greater than existing state-of-the-art temporal difference learning-based
methods. Additionally, the performance and stability improvements are largest
in the most challenging environments and data configurations, including AntMaze
Large Play/Diverse and Kitchen Mixed/Partial.
( 2
min )
In this paper, we extend an available neural network verification technique
to support a wider class of piece-wise linear activation functions.
Furthermore, we extend the algorithms, which provide in their original form
exact respectively over-approximative results for bounded input sets
represented as start sets, to allow also unbounded input set. We implemented
our algorithms and demonstrated their effectiveness in some case studies.
( 2
min )
In today's rapidly evolving educational landscape, traditional modes of
passive information delivery are giving way to transformative pedagogical
approaches that prioritize active student engagement. Within the context of
large-scale hybrid classrooms, the challenge lies in fostering meaningful and
active interaction between students and course content. This study delves into
the significance of measuring students' earnestness during interactive lecture
participation exercises. By analyzing students' responses to interactive
lecture poll questions, establishing a clear rubric for evaluating earnestness,
and conducting a comprehensive assessment, we introduce EIT (Earnest Insight
Toolkit), a tool designed to assess students' engagement within interactive
lecture participation exercises - particularly in the context of large-scale
hybrid classrooms. Through the utilization of EIT, our objective is to equip
educators with valuable means of identifying at-risk students for enhancing
intervention and support strategies, as well as measuring students' levels of
engagement with course content.
( 2
min )
According to the literature, Product reviews are an important source of
information for customers to support their buying decision. Product reviews
improve customer trust and loyalty. Reviews help customers in understanding
what other customers think about a particular product and helps in driving
purchase decisions. Therefore, for an e-commerce platform it is important to
understand the sentiments in customer reviews to understand their products and
services, and it also allows them to potentially create positive consumer
interaction as well as long lasting relationships. Reviews also provide
innovative ways to market the products for an ecommerce company. One such
approach is Nudge Marketing. Nudge marketing is a subtle way for an ecommerce
company to help their customers make better decisions without hesitation.
( 2
min )
In sparse linear bandits, a learning agent sequentially selects an action and
receive reward feedback, and the reward function depends linearly on a few
coordinates of the covariates of the actions. This has applications in many
real-world sequential decision making problems. In this paper, we propose a
simple and computationally efficient sparse linear estimation method called
PopArt that enjoys a tighter $\ell_1$ recovery guarantee compared to Lasso
(Tibshirani, 1996) in many problems. Our bound naturally motivates an
experimental design criterion that is convex and thus computationally efficient
to solve. Based on our novel estimator and design criterion, we derive sparse
linear bandit algorithms that enjoy improved regret upper bounds upon the state
of the art (Hao et al., 2020), especially w.r.t. the geometry of the given
action set. Finally, we prove a matching lower bound for sparse linear bandits
in the data-poor regime, which closes the gap between upper and lower bounds in
prior work.
( 2
min )
The infinitely wide neural network has been proven a useful and manageable
mathematical model that enables the understanding of many phenomena appearing
in deep learning. One example is the convergence of random deep networks to
Gaussian processes that allows a rigorous analysis of the way the choice of
activation function and network weights impacts the training dynamics. In this
paper, we extend the seminal proof of Matthews et al. (2018) to a larger class
of initial weight distributions (which we call PSEUDO-IID), including the
established cases of IID and orthogonal weights, as well as the emerging
low-rank and structured sparse settings celebrated for their computational
speed-up benefits. We show that fully-connected and convolutional networks
initialized with PSEUDO-IID distributions are all effectively equivalent up to
their variance. Using our results, one can identify the Edge-of-Chaos for a
broader class of neural networks and tune them at criticality in order to
enhance their training.
( 2
min )
In this work, we investigate the problem of public data-assisted
non-interactive LDP (Local Differential Privacy) learning with a focus on
non-parametric classification. Under the posterior drift assumption, we for the
first time derive the mini-max optimal convergence rate with LDP constraint.
Then, we present a novel approach, the locally private classification tree,
which attains the mini-max optimal convergence rate. Furthermore, we design a
data-driven pruning procedure that avoids parameter tuning and produces a fast
converging estimator. Comprehensive experiments conducted on synthetic and real
datasets show the superior performance of our proposed method. Both our
theoretical and experimental findings demonstrate the effectiveness of public
data compared to private data, which leads to practical suggestions for
prioritizing non-private data collection.
( 2
min )
We study the mean field Langevin dynamics and the associated particle system.
By assuming the functional convexity of the energy, we obtain the
$L^p$-convergence of the marginal distributions towards the unique invariant
measure for the mean field dynamics. Furthermore, we prove the uniform-in-time
propagation of chaos in both the $L^2$-Wasserstein metric and relative entropy.
( 2
min )
Modern time series classifiers display impressive predictive capabilities,
yet their decision-making processes mostly remain black boxes to the user. At
the same time, model-agnostic explainers, such as the recently proposed SHAP,
promise to make the predictions of machine learning models interpretable,
provided there are well-designed domain mappings. We bring both worlds together
in our timeXplain framework, extending the reach of explainable artificial
intelligence to time series classification and value prediction. We present
novel domain mappings for the time domain, frequency domain, and time series
statistics and analyze their explicative power as well as their limits. We
employ a novel evaluation metric to experimentally compare timeXplain to
several model-specific explanation approaches for state-of-the-art time series
classifiers.
( 2
min )
In this paper, we introduce a novel and computationally efficient method for
vertex embedding, community detection, and community size determination. Our
approach leverages a normalized one-hot graph encoder and a rank-based cluster
size measure. Through extensive simulations, we demonstrate the excellent
numerical performance of our proposed graph encoder ensemble algorithm.
( 2
min )
In domains where sample sizes are limited, efficient learning algorithms are
critical. Learning using privileged information (LuPI) offers increased sample
efficiency by allowing prediction models access to auxiliary information at
training time which is unavailable when the models are used. In recent work, it
was shown that for prediction in linear-Gaussian dynamical systems, a LuPI
learner with access to intermediate time series data is never worse and often
better in expectation than any unbiased classical learner. We provide new
insights into this analysis and generalize it to nonlinear prediction tasks in
latent dynamical systems, extending theoretical guarantees to the case where
the map connecting latent variables and observations is known up to a linear
transform. In addition, we propose algorithms based on random features and
representation learning for the case when this map is unknown. A suite of
empirical results confirm theoretical findings and show the potential of using
privileged time-series information in nonlinear prediction.
( 2
min )
The ability to construct a realistic simulator of financial exchanges,
including reproducing the dynamics of the limit order book, can give insight
into many counterfactual scenarios, such as a flash crash, a margin call, or
changes in macroeconomic outlook. In recent years, agent-based models have been
developed that reproduce many features of an exchange, as summarised by a set
of stylised facts and statistics. However, the ability to calibrate simulators
to a specific period of trading remains an open challenge. In this work, we
develop a novel approach to the calibration of market simulators by leveraging
recent advances in deep learning, specifically using neural density estimators
and embedding networks. We demonstrate that our approach is able to correctly
identify high probability parameter sets, both when applied to synthetic and
historical data, and without reliance on manually selected or weighted
ensembles of stylised facts.
( 2
min )
We consider the problem of linear estimation, and establish an extension of
the Gauss-Markov theorem, in which the bias operator is allowed to be non-zero
but bounded with respect to a matrix norm of Schatten type. We derive simple
and explicit formulas for the optimal estimator in the cases of Nuclear and
Spectral norms (with the Frobenius case recovering ridge regression).
Additionally, we analytically derive the generalization error in multiple
random matrix ensembles, and compare with Ridge regression. Finally, we conduct
an extensive simulation study, in which we show that the cross-validated
Nuclear and Spectral regressors can outperform Ridge in several circumstances.
( 2
min )
A new exploratory technique called biarchetype analysis is defined. We extend
archetype analysis to find the archetypes of both observations and features
simultaneously. The idea of this new unsupervised machine learning tool is to
represent observations and features by instances of pure types (biarchetypes)
that can be easily interpreted as they are mixtures of observations and
features. Furthermore, the observations and features are expressed as mixtures
of the biarchetypes, which also helps understand the structure of the data. We
propose an algorithm to solve biarchetype analysis. We show that biarchetype
analysis offers advantages over biclustering, especially in terms of
interpretability. This is because byarchetypes are extreme instances as opposed
to the centroids returned by biclustering, which favors human understanding.
Biarchetype analysis is applied to several machine learning problems to
illustrate its usefulness.
( 2
min )
A multitude of (dis)similarity measures between neural network
representations have been proposed, resulting in a fragmented research
landscape. Most of these measures fall into one of two categories.
First, measures such as linear regression, canonical correlations analysis
(CCA), and shape distances, all learn explicit mappings between neural units to
quantify similarity while accounting for expected invariances. Second, measures
such as representational similarity analysis (RSA), centered kernel alignment
(CKA), and normalized Bures similarity (NBS) all quantify similarity in summary
statistics, such as stimulus-by-stimulus kernel matrices, which are already
invariant to expected symmetries. Here, we take steps towards unifying these
two broad categories of methods by observing that the cosine of the Riemannian
shape distance (from category 1) is equal to NBS (from category 2). We explore
how this connection leads to new interpretations of shape distances and NBS,
and draw contrasts of these measures with CKA, a popular similarity measure in
the deep learning literature.
( 2
min )
A way to counter AI risks could be to create AI risks. The question is by who, a non-profit, a corporation, a nation, or a treaty? It may take extremes in systems, across tasks, to find out the depths of threats. If AI is used in weaponry, what are all the possible ways, such that… Read More »Why generative AI safety research is beyond alignment
The post Why generative AI safety research is beyond alignment appeared first on Data Science Central.
( 21
min )
In the dynamic world of streaming on Amazon Music, every search for a song, podcast, or playlist holds a story, a mood, or a flood of emotions waiting to be unveiled. These searches serve as a gateway to new discoveries, cherished experiences, and lasting memories. The search bar is not just about finding a song; […]
( 10
min )
This post is written in collaboration with Brad Duncan, Rachel Johnson and Richard Alcock from MathWorks. MATLAB is a popular programming tool for a wide range of applications, such as data processing, parallel computing, automation, simulation, machine learning, and artificial intelligence. It’s heavily used in many industries such as automotive, aerospace, communication, and manufacturing. In […]
( 10
min )
In this post, we demonstrate how to use the SageMaker Python SDK for text embedding and sentence similarity. Sentence similarity involves assessing the likeness between two pieces of text after they are converted into embeddings by the LLM, which is a foundation step for applications like Retrieval Augmented Generation (RAG).
( 10
min )
Amazon Textract is a machine learning (ML) service that automatically extracts text, handwriting, and data from any document or image. AnalyzeDocument Layout is a new feature that allows customers to automatically extract layout elements such as paragraphs, titles, subtitles, headers, footers, and more from documents. Layout extends Amazon Textract’s word and line detection by automatically […]
( 14
min )
Twelve teams of students and postdocs across the MIT community presented innovative startup ideas with potential for real-world impact.
( 11
min )
Genentech, a member of the Roche Group, is pioneering the use of generative AI to discover and develop new therapeutics and deliver treatments to patients more efficiently. A new collaboration between Genentech, the biotechnology pioneer, and NVIDIA aims to transform the discovery and development of new medicines by bringing together experts from each company to Read article >
( 6
min )
It’s the season of gratitude: that time of year to give thanks for the people and small moments that make life so special.
( 7
min )
So lately I've been getting a kick out of asking DALL-E3 for images labeled with text. They're just good enough to be legible, but yet:
The food that gets duplicated seems to vary from spread to spread.
I also asked DALL-E 3 to do the dessert
( 4
min )
AI Weirdness: the strange side of machine learning
( 2
min )
At Microsoft, we’re expanding AI capabilities by training small language models to achieve the kind of enhanced reasoning and comprehension typically found only in much larger models.
The post Orca 2: Teaching Small Language Models How to Reason appeared first on Microsoft Research.
( 10
min )
This work presents an analysis of the effectiveness of using standard shallow
feed-forward networks to mimic the behavior of the attention mechanism in the
original Transformer model, a state-of-the-art architecture for
sequence-to-sequence tasks. We substitute key elements of the attention
mechanism in the Transformer with simple feed-forward networks, trained using
the original components via knowledge distillation. Our experiments, conducted
on the IWSLT2017 dataset, reveal the capacity of these "attentionless
Transformers" to rival the performance of the original architecture. Through
rigorous ablation studies, and experimenting with various replacement network
types and sizes, we offer insights that support the viability of our approach.
This not only sheds light on the adaptability of shallow feed-forward networks
in emulating attention mechanisms but also underscores their potential to
streamline complex architectures for sequence-to-sequence tasks.
( 2
min )
Federated Learning (FL) enables collaborative machine learning model training
across multiple parties without sharing raw data. However, FL's distributed
nature allows malicious clients to impact model training through Byzantine or
backdoor attacks, using erroneous model updates. Existing defenses measure the
deviation of each update from a 'ground-truth model update.' They often rely on
a benign root dataset on the server or use trimmed mean or median for clipping,
both methods having limitations.
We introduce FedTruth, a robust defense against model poisoning in FL.
FedTruth doesn't assume specific data distributions nor requires a benign root
dataset. It estimates a global model update with dynamic aggregation weights,
considering contributions from all benign clients. Empirical studies
demonstrate FedTruth's efficacy in mitigating the impacts of poisoned updates
from both Byzantine and backdoor attacks.
( 2
min )
Decades of research indicate that emotion recognition is more effective when
drawing information from multiple modalities. But what if some modalities are
sometimes missing? To address this problem, we propose a novel
Transformer-based architecture for recognizing valence and arousal in a
time-continuous manner even with missing input modalities. We use a coupling of
cross-attention and self-attention mechanisms to emphasize relationships
between modalities during time and enhance the learning process on weak salient
inputs. Experimental results on the Ulm-TSST dataset show that our model
exhibits an improvement of the concordance correlation coefficient evaluation
of 37% when predicting arousal values and 30% when predicting valence values,
compared to a late-fusion baseline approach.
( 2
min )
Online High Definition Map (HDMap) estimation from sensors offers a low-cost
alternative to manually acquired HDMaps. As such, it promises to lighten costs
for already HDMap-reliant Autonomous Driving systems, and potentially even
spread their use to new systems. In this paper, we propose to improve online
HDMap estimation by accounting for already existing maps. We identify 3
reasonable types of useful existing maps (minimalist, noisy, and outdated). We
also introduce MapEX, a novel online HDMap estimation framework that accounts
for existing maps. MapEX achieves this by encoding map elements into query
tokens and by refining the matching algorithm used to train classic query based
map estimation models. We demonstrate that MapEX brings significant
improvements on the nuScenes dataset. For instance, MapEX - given noisy maps -
improves by 38% over the MapTRv2 detector it is based on and by 16% over the
current SOTA.
( 2
min )
Despite the widespread use and success of machine-learning techniques for
detecting phase transitions from data, their working principle and fundamental
limits remain elusive. Here, we explain the inner workings and identify
potential failure modes of these techniques by rooting popular machine-learning
indicators of phase transitions in information-theoretic concepts. Using tools
from information geometry, we prove that several machine-learning indicators of
phase transitions approximate the square root of the system's (quantum) Fisher
information from below -- a quantity that is known to indicate phase
transitions but is often difficult to compute from data. We numerically
demonstrate the quality of these bounds for phase transitions in classical and
quantum systems.
( 2
min )
Graph neural networks have been successful for machine learning, as well as
for combinatorial and graph problems such as the Subgraph Isomorphism Problem
and the Traveling Salesman Problem. We describe an approach for computing graph
sparsifiers by combining a graph neural network and Monte Carlo Tree Search. We
first train a graph neural network that takes as input a partial solution and
proposes a new node to be added as output. This neural network is then used in
a Monte Carlo search to compute a sparsifier. The proposed method consistently
outperforms several standard approximation algorithms on different types of
graphs and often finds the optimal solution.
( 2
min )
Tabular classification has traditionally relied on supervised algorithms,
which estimate the parameters of a prediction model using its training data.
Recently, Prior-Data Fitted Networks (PFNs) such as TabPFN have successfully
learned to classify tabular data in-context: the model parameters are designed
to classify new samples based on labelled training samples given after the
model training. While such models show great promise, their applicability to
real-world data remains limited due to the computational scale needed. Here we
study the following question: given a pre-trained PFN for tabular data, what is
the best way to summarize the labelled training samples before feeding them to
the model? We conduct an initial investigation of sketching and
feature-selection methods for TabPFN, and note certain key differences between
it and conventionally fitted tabular models.
( 2
min )
Uncovering the mechanisms behind long-term memory is one of the most
fascinating open problems in neuroscience and artificial intelligence.
Artificial associative memory networks have been used to formalize important
aspects of biological memory. Generative diffusion models are a type of
generative machine learning techniques that have shown great performance in
many tasks. Like associative memory systems, these networks define a dynamical
system that converges to a set of target states. In this work we show that
generative diffusion models can be interpreted as energy-based models and that,
when trained on discrete patterns, their energy function is (asymptotically)
identical to that of modern Hopfield networks. This equivalence allows us to
interpret the supervised training of diffusion models as a synaptic learning
process that encodes the associative dynamics of a modern Hopfield network in
the weight structure of a deep neural network. Leveraging this connection, we
formulate a generalized framework for understanding the formation of long-term
memory, where creative generation and memory recall can be seen as parts of a
unified continuum.
( 2
min )
We present Emu Video, a text-to-video generation model that factorizes the
generation into two steps: first generating an image conditioned on the text,
and then generating a video conditioned on the text and the generated image. We
identify critical design decisions--adjusted noise schedules for diffusion, and
multi-stage training--that enable us to directly generate high quality and high
resolution videos, without requiring a deep cascade of models as in prior work.
In human evaluations, our generated videos are strongly preferred in quality
compared to all prior work--81% vs. Google's Imagen Video, 90% vs. Nvidia's
PYOCO, and 96% vs. Meta's Make-A-Video. Our model outperforms commercial
solutions such as RunwayML's Gen2 and Pika Labs. Finally, our factorizing
approach naturally lends itself to animating images based on a user's text
prompt, where our generations are preferred 96% over prior work.
( 2
min )
Sharpness-aware minimization (SAM) was proposed to reduce sharpness of minima
and has been shown to enhance generalization performance in various settings.
In this work we show that perturbing only the affine normalization parameters
(typically comprising 0.1% of the total parameters) in the adversarial step of
SAM can outperform perturbing all of the parameters.This finding generalizes to
different SAM variants and both ResNet (Batch Normalization) and Vision
Transformer (Layer Normalization) architectures. We consider alternative sparse
perturbation approaches and find that these do not achieve similar performance
enhancement at such extreme sparsity levels, showing that this behaviour is
unique to the normalization layers. Although our findings reaffirm the
effectiveness of SAM in improving generalization performance, they cast doubt
on whether this is solely caused by reduced sharpness.
( 2
min )
Pull Requests (PRs) that are neither progressed nor resolved clutter the list
of PRs, making it difficult for the maintainers to manage and prioritize
unresolved PRs. To automatically track, follow up, and close such inactive PRs,
Stale bot was introduced by GitHub. Despite its increasing adoption, there are
ongoing debates on whether using Stale bot alleviates or exacerbates the
problem of inactive PRs. To better understand if and how Stale bot helps
projects in their pull-based development workflow, we perform an empirical
study of 20 large and popular open-source projects. We find that Stale bot can
help deal with a backlog of unresolved PRs as the projects closed more PRs
within the first few months of adoption. Moreover, Stale bot can help improve
the efficiency of the PR review process as the projects reviewed PRs that ended
up merged and resolved PRs that ended up closed faster after the adoption.
However, Stale bot can also negatively affect the contributors as the projects
experienced a considerable decrease in their number of active contributors
after the adoption. Therefore, relying solely on Stale bot to deal with
inactive PRs may lead to decreased community engagement and an increased
probability of contributor abandonment.
( 3
min )
Automated creation of synthetic traffic scenarios is a key part of validating
the safety of autonomous vehicles (AVs). In this paper, we propose Scenario
Diffusion, a novel diffusion-based architecture for generating traffic
scenarios that enables controllable scenario generation. We combine latent
diffusion, object detection and trajectory regression to generate distributions
of synthetic agent poses, orientations and trajectories simultaneously. To
provide additional control over the generated scenario, this distribution is
conditioned on a map and sets of tokens describing the desired scenario. We
show that our approach has sufficient expressive capacity to model diverse
traffic patterns and generalizes to different geographical regions.
( 2
min )
Semiparametric efficient estimation of various multi-valued causal effects,
including quantile treatment effects, is important in economic, biomedical, and
other social sciences. Under the unconfoundedness condition, adjustment for
confounders requires estimating the nuisance functions relating outcome or
treatment to confounders nonparametrically. This paper considers a generalized
optimization framework for efficient estimation of general treatment effects
using artificial neural networks (ANNs) to approximate the unknown nuisance
function of growing-dimensional confounders. We establish a new approximation
error bound for the ANNs to the nuisance function belonging to a mixed
smoothness class without a known sparsity structure. We show that the ANNs can
alleviate the "curse of dimensionality" under this circumstance. We establish
the root-$n$ consistency and asymptotic normality of the proposed general
treatment effects estimators, and apply a weighted bootstrap procedure for
conducting inference. The proposed methods are illustrated via simulation
studies and a real data application.
( 2
min )
From classical HPC to deep learning, MatMul is at the heart of today's
computing. The recent Maddness method approximates MatMul without the need for
multiplication by using a hash-based version of product quantization (PQ)
indexing into a look-up table (LUT). Stella Nera is the first Maddness
accelerator and it achieves 15x higher area efficiency (GMAC/s/mm^2) and more
than 25x higher energy efficiency (TMAC/s/W) than direct MatMul accelerators
implemented in the same technology. The hash function is a decision tree, which
allows for an efficient hardware implementation as the multiply-accumulate
operations are replaced by decision tree passes and LUT lookups. The entire
Maddness MatMul can be broken down into parts that allow an effective
implementation with small computing units and memories, allowing it to reach
extreme efficiency while remaining generically applicable for MatMul tasks. In
a commercial 14nm technology and scaled to 3nm, we achieve an energy efficiency
of 161 TOp/s/W@0.55V with a Top-1 accuracy on CIFAR-10 of more than 92.5% using
ResNet9.
( 2
min )
Despite the widespread use and success of machine-learning techniques for
detecting phase transitions from data, their working principle and fundamental
limits remain elusive. Here, we explain the inner workings and identify
potential failure modes of these techniques by rooting popular machine-learning
indicators of phase transitions in information-theoretic concepts. Using tools
from information geometry, we prove that several machine-learning indicators of
phase transitions approximate the square root of the system's (quantum) Fisher
information from below -- a quantity that is known to indicate phase
transitions but is often difficult to compute from data. We numerically
demonstrate the quality of these bounds for phase transitions in classical and
quantum systems.
( 2
min )
Decades of research indicate that emotion recognition is more effective when
drawing information from multiple modalities. But what if some modalities are
sometimes missing? To address this problem, we propose a novel
Transformer-based architecture for recognizing valence and arousal in a
time-continuous manner even with missing input modalities. We use a coupling of
cross-attention and self-attention mechanisms to emphasize relationships
between modalities during time and enhance the learning process on weak salient
inputs. Experimental results on the Ulm-TSST dataset show that our model
exhibits an improvement of the concordance correlation coefficient evaluation
of 37% when predicting arousal values and 30% when predicting valence values,
compared to a late-fusion baseline approach.
( 2
min )
The evolution of data management has kept pace with the rapid increase in data generation, and after beginning with straightforward relational databases and ETL, big data and unstructured data paved the way for the development of automated data pipelines and lakes. But this data cascade appears to have no stop in sight. Contemporary data surpasses… Read More »From Confusion to Clarity: How AI Simplifies Data Management for Enterprises
The post From Confusion to Clarity: How AI Simplifies Data Management for Enterprises appeared first on Data Science Central.
( 21
min )
Last night’s changes in AI have been seismic post the shock resignation of Sam Altman It is still early days and these changes will be played out Undoubtedly, this change will impact AI roadmaps worldwide So, how should you re-calibrate your AI road map post Sam Altman leaving OpenAI? Has anything really changed? I was… Read More »Should you recalibrate your AI roadmap post changes in OpenAI ?
The post Should you recalibrate your AI roadmap post changes in OpenAI ? appeared first on Data Science Central.
( 20
min )
Retrieval Augmented Generation (RAG) allows you to provide a large language model (LLM) with access to data from external knowledge sources such as repositories, databases, and APIs without the need to fine-tune it. When using generative AI for question answering, RAG enables LLMs to answer questions with the most relevant, up-to-date information and optionally cite […]
( 14
min )
KT Corporation is one of the largest telecommunications providers in South Korea, offering a wide range of services including fixed-line telephone, mobile communication, and internet, and AI services. KT’s AI Food Tag is an AI-based dietary management solution that identifies the type and nutritional content of food in photos using a computer vision model. This […]
( 11
min )
Lifelong model editing fixes mistakes discovered after model deployment. This work could expand sequential editing to model properties like fairness and privacy and enable a new class of solutions for adapting LLMs over long deployment lifetimes.
The post Lifelong model editing in large language models: Balancing low-cost targeted edits and catastrophic forgetting appeared first on Microsoft Research.
( 12
min )
MIT CSAIL researchers innovate with synthetic imagery to train AI, paving the way for more efficient and bias-reduced machine learning.
( 9
min )
In the era of transfer learning, training neural networks from scratch is
becoming obsolete. Transfer learning leverages prior knowledge for new tasks,
conserving computational resources. While its advantages are well-documented,
we uncover a notable drawback: networks tend to prioritize basic data patterns,
forsaking valuable pre-learned features. We term this behavior "feature
erosion" and analyze its impact on network performance and internal
representations.
( 2
min )
It stands to reason that the amount and the quality of big data is of key
importance for setting up accurate AI-driven models. Nonetheless, we believe
there are still critical roadblocks in the inherent generation of databases,
that are often underestimated and poorly discussed in the literature. In our
view, such issues can seriously hinder the AI-based discovery process, even
when high quality, sufficiently large and highly reputable data sources are
available. Here, considering superconducting and thermoelectric materials as
two representative case studies, we specifically discuss three aspects, namely
intrinsically biased sample selection, possible hidden variables, disparate
data age. Importantly, to our knowledge, we suggest and test a first strategy
capable of detecting and quantifying the presence of the intrinsic data bias.
( 2
min )
This paper presents Fossil 2.0, a new major release of a software tool for
the synthesis of certificates (e.g., Lyapunov and barrier functions) for
dynamical systems modelled as ordinary differential and difference equations.
Fossil 2.0 is much improved from its original release, including new
interfaces, a significantly expanded certificate portfolio, controller
synthesis and enhanced extensibility. We present these new features as part of
this tool paper. Fossil implements a counterexample-guided inductive synthesis
(CEGIS) loop ensuring the soundness of the method. Our tool uses neural
networks as templates to generate candidate functions, which are then formally
proven by an SMT solver acting as an assertion verifier. Improvements with
respect to the first release include a wider range of certificates, synthesis
of control laws, and support for discrete-time models.
( 2
min )
This paper studies convergence rates for some value function approximations
that arise in a collection of reproducing kernel Hilbert spaces (RKHS)
$H(\Omega)$. By casting an optimal control problem in a specific class of
native spaces, strong rates of convergence are derived for the operator
equation that enables offline approximations that appear in policy iteration.
Explicit upper bounds on error in value function and controller approximations
are derived in terms of power function $\Pwr_{H,N}$ for the space of finite
dimensional approximants $H_N$ in the native space $H(\Omega)$. These bounds
are geometric in nature and refine some well-known, now classical results
concerning convergence of approximations of value functions.
( 2
min )
This paper focuses on the task of Extreme Multi-Label Classification (XMC)
whose goal is to predict multiple labels for each instance from an extremely
large label space. While existing research has primarily focused on fully
supervised XMC, real-world scenarios often lack complete supervision signals,
highlighting the importance of zero-shot settings. Given the large label space,
utilizing in-context learning approaches is not trivial. We address this issue
by introducing In-Context Extreme Multilabel Learning (ICXML), a two-stage
framework that cuts down the search space by generating a set of candidate
labels through incontext learning and then reranks them. Extensive experiments
suggest that ICXML advances the state of the art on two diverse public
benchmarks.
( 2
min )
Accurately predicting drug-drug interactions (DDI) for emerging drugs, which
offer possibilities for treating and alleviating diseases, with computational
methods can improve patient care and contribute to efficient drug development.
However, many existing computational methods require large amounts of known DDI
information, which is scarce for emerging drugs. In this paper, we propose
EmerGNN, a graph neural network (GNN) that can effectively predict interactions
for emerging drugs by leveraging the rich information in biomedical networks.
EmerGNN learns pairwise representations of drugs by extracting the paths
between drug pairs, propagating information from one drug to the other, and
incorporating the relevant biomedical concepts on the paths. The different
edges on the biomedical network are weighted to indicate the relevance for the
target DDI prediction. Overall, EmerGNN has higher accuracy than existing
approaches in predicting interactions for emerging drugs and can identify the
most relevant information on the biomedical network.
( 2
min )
We explore the abstract reasoning abilities of text-only and multimodal
versions of GPT-4, using the ConceptARC benchmark [10], which is designed to
evaluate robust understanding and reasoning with core-knowledge concepts. We
extend the work of Moskvichev et al. [10] by evaluating GPT-4 on more detailed,
one-shot prompting (rather than simple, zero-shot prompts) with text versions
of ConceptARC tasks, and by evaluating GPT-4V, the multimodal version of GPT-4,
on zero- and one-shot prompts using image versions of the simplest tasks. Our
experimental results support the conclusion that neither version of GPT-4 has
developed robust abstraction abilities at humanlike levels.
( 2
min )
The widespread integration of autoregressive-large language models (AR-LLMs),
such as ChatGPT, across established applications, like search engines, has
introduced critical vulnerabilities with uniquely scalable characteristics. In
this commentary, we analyse these vulnerabilities, their dependence on natural
language as a vector of attack, and their challenges to cybersecurity best
practices. We offer recommendations designed to mitigate these challenges.
( 2
min )
We applied a data-driven approach that explores the usability of the NetMob
2023 dataset in modelling mobility patterns within an urban context. We
combined the data with a highly suitable external source, the ENACT dataset,
which provides a 1 km x 1km grid with estimates of the day and night population
across Europe. We developed three sets of XGBoost models that predict the
population in each 100m x 100m grid cell used in NetMob2023 based on the mobile
data traffic of the 68 online services covered in the dataset, using the ENACT
values as ground truth. The results suggest that the NetMob 2023 data can be
useful for the estimation of the day and night population and grid cell level
and can explain part of the dynamics of urban mobility.
( 2
min )
The theory of statistical learning has focused on variational objectives
expressed on functions. In this note, we discuss motivations to write similar
objectives on measures, in particular to discuss out-of-distribution
generalization and weakly-supervised learning. It raises a natural question:
can one cast usual statistical learning results to objectives expressed on
measures? Does the resulting construction lead to new algorithms of practical
interest?
( 2
min )
Individualized treatment decisions can improve health outcomes, but using
data to make these decisions in a reliable, precise, and generalizable way is
challenging with a single dataset. Leveraging multiple randomized controlled
trials allows for the combination of datasets with unconfounded treatment
assignment to better estimate heterogeneous treatment effects. This paper
discusses several non-parametric approaches for estimating heterogeneous
treatment effects using data from multiple trials. We extend single-study
methods to a scenario with multiple trials and explore their performance
through a simulation study, with data generation scenarios that have differing
levels of cross-trial heterogeneity. The simulations demonstrate that methods
that directly allow for heterogeneity of the treatment effect across trials
perform better than methods that do not, and that the choice of single-study
method matters based on the functional form of the treatment effect. Finally,
we discuss which methods perform well in each setting and then apply them to
four randomized controlled trials to examine effect heterogeneity of treatments
for major depressive disorder.
( 2
min )
How do score-based generative models (SBMs) learn the data distribution
supported on a low-dimensional manifold? We investigate the score model of a
trained SBM through its linear approximations and subspaces spanned by local
feature vectors. During diffusion as the noise decreases, the local
dimensionality increases and becomes more varied between different sample
sequences. Importantly, we find that the learned vector field mixes samples by
a non-conservative field within the manifold, although it denoises with normal
projections as if there is an energy function in off-manifold directions. At
each noise level, the subspace spanned by the local features overlap with an
effective density function. These observations suggest that SBMs can flexibly
mix samples with the learned score field while carefully maintaining a
manifold-like structure of the data distribution.
( 2
min )
The estimation of categorical distributions under marginal constraints
summarizing some sample from a population in the most-generalizable way is key
for many machine-learning and data-driven approaches. We provide a
parameter-agnostic theoretical framework that enables this task ensuring (i)
that a categorical distribution of Maximum Entropy under marginal constraints
always exists and (ii) that it is unique. The procedure of iterative
proportional fitting (IPF) naturally estimates that distribution from any
consistent set of marginal constraints directly in the space of probabilities,
thus deductively identifying a least-biased characterization of the population.
The theoretical framework together with IPF leads to a holistic workflow that
enables modeling any class of categorical distributions solely using the
phenomenological information provided.
( 2
min )
Despite a great deal of research, it is still not well-understood why trained
neural networks are highly vulnerable to adversarial examples. In this work we
focus on two-layer neural networks trained using data which lie on a low
dimensional linear subspace. We show that standard gradient methods lead to
non-robust neural networks, namely, networks which have large gradients in
directions orthogonal to the data subspace, and are susceptible to small
adversarial $L_2$-perturbations in these directions. Moreover, we show that
decreasing the initialization scale of the training algorithm, or adding $L_2$
regularization, can make the trained network more robust to adversarial
perturbations orthogonal to the data.
( 2
min )
Urban traffic congestion remains a pressing challenge in our rapidly
expanding cities, despite the abundance of available data and the efforts of
policymakers. By leveraging behavioral system theory and data-driven control,
this paper exploits the DeePC algorithm in the context of urban traffic control
performed via dynamic traffic lights. To validate our approach, we consider a
high-fidelity case study using the state-of-the-art simulation software package
Simulation of Urban MObility (SUMO). Preliminary results indicate that DeePC
outperforms existing approaches across various key metrics, including travel
time and CO$_2$ emissions, demonstrating its potential for effective traffic
management
( 2
min )
We tackle in this paper an online network resource allocation problem with
job transfers. The network is composed of many servers connected by
communication links. The system operates in discrete time; at each time slot,
the administrator reserves resources at servers for future job requests, and a
cost is incurred for the reservations made. Then, after receptions, the jobs
may be transferred between the servers to best accommodate the demands. This
incurs an additional transport cost. Finally, if a job request cannot be
satisfied, there is a violation that engenders a cost to pay for the blocked
job. We propose a randomized online algorithm based on the exponentially
weighted method. We prove that our algorithm enjoys a sub-linear in time
regret, which indicates that the algorithm is adapting and learning from its
experiences and is becoming more efficient in its decision-making as it
accumulates more data. Moreover, we test the performance of our algorithm on
artificial data and compare it against a reinforcement learning method where we
show that our proposed method outperforms the latter.
( 2
min )
Large language models (LLMs) such as LLaMA and OpenAI’s GPT-4 are revolutionizing technology. However, one of the common complaints about LLMs is their speed, or lack thereof. In many cases, it takes a long time to get an answer from them. This limits LLMs’ applications and their usefulness in latency-critical functions, such as chatbots, copilots, […]
The post Skeleton-of-Thought: Parallel decoding speeds up and improves LLM output appeared first on Microsoft Research.
( 12
min )
Good news for car lovers: Two acclaimed auto shows, taking place now through next week, are delighting attendees with displays of next-generation automotive designs powered by AI. Hundreds of thousands of auto enthusiasts worldwide are expected to visit Guangzhou, China — known as the city of flowers — to attend its auto show, running through Read article >
( 6
min )
European startups will get a massive boost from a new generation of AI infrastructure, NVIDIA founder and CEO Jensen Huang said Friday in a fireside chat with iliad Group Deputy CEO Aude Durand — and it’s coming just in time. “We’re now seeing a major second wave,” Huang said of the state of AI during Read article >
( 7
min )
Europe’s startup ecosystem is getting a boost of accelerated computing for generative AI. NVIDIA and cloud service provider (CSP) Scaleway are working together to deliver access to GPUs, NVIDIA AI Enterprise software, and services for turbocharging large language models (LLMs) and generative AI development for European startups. Scaleway, a subsidiary of French telecommunications provider iliad Read article >
( 6
min )
Amazon Interactive Video Service (Amazon IVS) is a managed live streaming solution that is designed to provide a quick and straightforward setup to let you build interactive video experiences and handles interactive video content from ingestion to delivery. With the increased usage of live streaming, the need for effective content moderation becomes even more crucial. […]
( 9
min )
Generative AI models have the potential to revolutionize enterprise operations, but businesses must carefully consider how to harness their power while overcoming challenges such as safeguarding data and ensuring the quality of AI-generated content. The Retrieval-Augmented Generation (RAG) framework augments prompts with external data from multiple sources, such as document repositories, databases, or APIs, to […]
( 6
min )
From enhancing the conversational experience to agent assistance, there are plenty of ways that generative artificial intelligence (AI) and foundation models (FMs) can help deliver faster, better support. With the increasing availability and diversity of FMs, it’s difficult to experiment and keep up-to-date with the latest model versions. Amazon Bedrock is a fully managed service […]
( 7
min )
This paper explores the potential of the transformer models for learning
Granger causality in networks with complex nonlinear dynamics at every node, as
in neurobiological and biophysical networks. Our study primarily focuses on a
proof-of-concept investigation based on simulated neural dynamics, for which
the ground-truth causality is known through the underlying connectivity matrix.
For transformer models trained to forecast neuronal population dynamics, we
show that the cross attention module effectively captures the causal
relationship among neurons, with an accuracy equal or superior to that for the
most popular Granger causality analysis method. While we acknowledge that
real-world neurobiology data will bring further challenges, including dynamic
connectivity and unobserved variability, this research offers an encouraging
preliminary glimpse into the utility of the transformer model for causal
representation learning in neuroscience.
( 2
min )
Hyperparameters of Deep Learning (DL) pipelines are crucial for their
downstream performance. While a large number of methods for Hyperparameter
Optimization (HPO) have been developed, their incurred costs are often
untenable for modern DL. Consequently, manual experimentation is still the most
prevalent approach to optimize hyperparameters, relying on the researcher's
intuition, domain knowledge, and cheap preliminary explorations. To resolve
this misalignment between HPO algorithms and DL researchers, we propose
PriorBand, an HPO algorithm tailored to DL, able to utilize both expert beliefs
and cheap proxy tasks. Empirically, we demonstrate PriorBand's efficiency
across a range of DL benchmarks and show its gains under informative expert
input and robustness against poor expert beliefs
( 2
min )
In recent years, Reward Machines (RMs) have stood out as a simple yet
effective automata-based formalism for exposing and exploiting task structure
in reinforcement learning settings. Despite their relevance, little to no
attention has been directed to the study of their security implications and
robustness to adversarial scenarios, likely due to their recent appearance in
the literature. With my thesis, I aim to provide the first analysis of the
security of RM-based reinforcement learning techniques, with the hope of
motivating further research in the field, and I propose and evaluate a novel
class of attacks on RM-based techniques: blinding attacks.
( 2
min )
An ideal length-extrapolatable Transformer language model can handle
sequences longer than the training length without any fine-tuning. Such
long-context utilization capability relies heavily on a flexible positional
embedding design. Upon investigating the flexibility of existing large
pre-trained Transformer language models, we find that the T5 family deserves a
closer look, as its positional embeddings capture rich and flexible attention
patterns. However, T5 suffers from the dispersed attention issue: the longer
the input sequence, the flatter the attention distribution. To alleviate the
issue, we propose two attention alignment strategies via temperature scaling.
Our findings show improvement on the long-context utilization capability of T5
on language modeling, retrieval, multi-document question answering, and code
completion tasks without any fine-tuning. This suggests that a flexible
positional embedding design and attention alignment can go a long way toward
Transformer length extrapolation.
( 2
min )
Named Entity Recognition (NER) is essential in various Natural Language
Processing (NLP) applications. Traditional NER models are effective but limited
to a set of predefined entity types. In contrast, Large Language Models (LLMs)
can extract arbitrary entities through natural language instructions, offering
greater flexibility. However, their size and cost, particularly for those
accessed via APIs like ChatGPT, make them impractical in resource-limited
scenarios. In this paper, we introduce a compact NER model trained to identify
any type of entity. Leveraging a bidirectional transformer encoder, our model,
GLiNER, facilitates parallel entity extraction, an advantage over the slow
sequential token generation of LLMs. Through comprehensive testing, GLiNER
demonstrate strong performance, outperforming both ChatGPT and fine-tuned LLMs
in zero-shot evaluations on various NER benchmarks.
( 2
min )
Traffic simulators are used to generate data for learning in intelligent
transportation systems (ITSs). A key question is to what extent their modelling
assumptions affect the capabilities of ITSs to adapt to various scenarios when
deployed in the real world. This work focuses on two simulators commonly used
to train reinforcement learning (RL) agents for traffic applications, CityFlow
and SUMO. A controlled virtual experiment varying driver behavior and
simulation scale finds evidence against distributional equivalence in
RL-relevant measures from these simulators, with the root mean squared error
and KL divergence being significantly greater than 0 for all assessed measures.
While granular real-world validation generally remains infeasible, these
findings suggest that traffic simulators are not a deus ex machina for RL
training: understanding the impacts of inter-simulator differences is necessary
to train and deploy RL-based ITSs.
( 2
min )
Some stars are known to explode at the end of their lives, called supernovae
(SNe). The substantial amount of matter and energy that SNe release provides
significant feedback to star formation and gas dynamics in a galaxy. SNe
release a substantial amount of matter and energy to the interstellar medium,
resulting in significant feedback to star formation and gas dynamics in a
galaxy. While such feedback has a crucial role in galaxy formation and
evolution, in simulations of galaxy formation, it has only been implemented
using simple {\it sub-grid models} instead of numerically solving the evolution
of gas elements around SNe in detail due to a lack of resolution. We develop a
method combining machine learning and Gibbs sampling to predict how a supernova
(SN) affects the surrounding gas. The fidelity of our model in the thermal
energy and momentum distribution outperforms the low-resolution SN simulations.
Our method can replace the SN sub-grid models and help properly simulate
un-resolved SN feedback in galaxy formation simulations. We find that employing
our new approach reduces the necessary computational cost to $\sim$ 1 percent
compared to directly resolving SN feedback.
( 2
min )
Recognizing emotions in spoken communication is crucial for advanced
human-machine interaction. Current emotion detection methodologies often
display biases when applied cross-corpus. To address this, our study
amalgamates 16 diverse datasets, resulting in 375 hours of data across
languages like English, Chinese, and Japanese. We propose a soft labeling
system to capture gradational emotional intensities. Using the Whisper encoder
and data augmentation methods inspired by contrastive learning, our method
emphasizes the temporal dynamics of emotions. Our validation on four
multilingual datasets demonstrates notable zero-shot generalization. We publish
our open source model weights and initial promising results after fine-tuning
on Hume-Prosody.
( 2
min )
This study proposes a new approach that investigates differences in
topological characteristics of visual networks, which are constructed using
fMRI BOLD time-series corresponding to visual datasets of COCO, ImageNet, and
SUN. A publicly available BOLD5000 dataset is utilized that contains fMRI scans
while viewing 5254 images of diverse complexities. The objective of this study
is to examine how network topology differs in response to distinct visual
stimuli from these visual datasets. To achieve this, 0- and 1-dimensional
persistence diagrams are computed for each visual network representing COCO,
ImageNet, and SUN. For extracting suitable features from topological
persistence diagrams, K-means clustering is executed. The extracted K-means
cluster features are fed to a novel deep-hybrid model that yields accuracy in
the range of 90%-95% in classifying these visual networks. To understand
vision, this type of visual network categorization across visual datasets is
important as it captures differences in BOLD signals while perceiving images
with different contexts and complexities. Furthermore, distinctive topological
patterns of visual network associated with each dataset, as revealed from this
study, could potentially lead to the development of future neuroimaging
biomarkers for diagnosing visual processing disorders like visual agnosia or
prosopagnosia, and tracking changes in visual cognition over time.
( 3
min )
This article provides an understanding of Natural Language Processing
techniques in the framework of financial regulation, more specifically in order
to perform semantic matching search between rules and policy when no dataset is
available for supervised learning. We outline how to outperform simple
pre-trained sentences-transformer models using freely available resources and
explain the mathematical concepts behind the key building blocks of Natural
Language Processing.
( 2
min )
We study mean-field variational inference in a Bayesian linear model when the
sample size n is comparable to the dimension p. In high dimensions, the common
approach of minimizing a Kullback-Leibler divergence from the posterior
distribution, or maximizing an evidence lower bound, may deviate from the true
posterior mean and underestimate posterior uncertainty. We study instead
minimization of the TAP free energy, showing in a high-dimensional asymptotic
framework that it has a local minimizer which provides a consistent estimate of
the posterior marginals and may be used for correctly calibrated posterior
inference. Geometrically, we show that the landscape of the TAP free energy is
strongly convex in an extensive neighborhood of this local minimizer, which
under certain general conditions can be found by an Approximate Message Passing
(AMP) algorithm. We then exhibit an efficient algorithm that linearly converges
to the minimizer within this local neighborhood. In settings where it is
conjectured that no efficient algorithm can find this local neighborhood, we
prove analogous geometric properties for a local minimizer of the TAP free
energy reachable by AMP, and show that posterior inference based on this
minimizer remains correctly calibrated.
( 2
min )
Modeling is crucial to understanding the effect of greenhouse gases, warming,
and ice sheet melting on the ocean. At the same time, ocean processes affect
phenomena such as hurricanes and droughts. Parameters in the models that cannot
be physically measured have a significant effect on the model output. For an
idealized ocean model, we generated perturbed parameter ensemble data and
trained surrogate neural network models. The neural surrogates accurately
predicted the one-step forward dynamics, of which we then computed the
parametric sensitivity.
( 2
min )
Dynamic Item Response Models extend the standard Item Response Theory (IRT)
to capture temporal dynamics in learner ability. While these models have the
potential to allow instructional systems to actively monitor the evolution of
learner proficiency in real time, existing dynamic item response models rely on
expensive inference algorithms that scale poorly to massive datasets. In this
work, we propose Variational Temporal IRT (VTIRT) for fast and accurate
inference of dynamic learner proficiency. VTIRT offers orders of magnitude
speedup in inference runtime while still providing accurate inference.
Moreover, the proposed algorithm is intrinsically interpretable by virtue of
its modular design. When applied to 9 real student datasets, VTIRT consistently
yields improvements in predicting future learner performance over other learner
proficiency models.
( 2
min )
Approximate message passing (AMP) is a family of iterative algorithms that
generalize matrix power iteration. AMP algorithms are known to optimally solve
many average-case optimization problems. In this paper, we show that a large
class of AMP algorithms can be simulated in polynomial time by \emph{local
statistics hierarchy} semidefinite programs (SDPs), even when an unknown
principal minor of measure $1/\mathrm{polylog}(\mathrm{dimension})$ is
adversarially corrupted. Ours are the first robust guarantees for many of these
problems. Further, our results offer an interesting counterpoint to strong
lower bounds against less constrained SDP relaxations for average-case
max-cut-gain (a.k.a. "optimizing the Sherrington-Kirkpatrick Hamiltonian") and
other problems.
( 2
min )
Self-supervised representation learning often uses data augmentations to
induce some invariance to "style" attributes of the data. However, with
downstream tasks generally unknown at training time, it is difficult to deduce
a priori which attributes of the data are indeed "style" and can be safely
discarded. To address this, we introduce a more principled approach that seeks
to disentangle style features rather than discard them. The key idea is to add
multiple style embedding spaces where: (i) each is invariant to all-but-one
augmentation; and (ii) joint entropy is maximized. We formalize our structured
data-augmentation procedure from a causal latent-variable-model perspective,
and prove identifiability of both content and (multiple blocks of) style
variables. We empirically demonstrate the benefits of our approach on synthetic
datasets and then present promising but limited results on ImageNet.
( 2
min )
We study mean-field variational inference in a Bayesian linear model when the
sample size n is comparable to the dimension p. In high dimensions, the common
approach of minimizing a Kullback-Leibler divergence from the posterior
distribution, or maximizing an evidence lower bound, may deviate from the true
posterior mean and underestimate posterior uncertainty. We study instead
minimization of the TAP free energy, showing in a high-dimensional asymptotic
framework that it has a local minimizer which provides a consistent estimate of
the posterior marginals and may be used for correctly calibrated posterior
inference. Geometrically, we show that the landscape of the TAP free energy is
strongly convex in an extensive neighborhood of this local minimizer, which
under certain general conditions can be found by an Approximate Message Passing
(AMP) algorithm. We then exhibit an efficient algorithm that linearly converges
to the minimizer within this local neighborhood. In settings where it is
conjectured that no efficient algorithm can find this local neighborhood, we
prove analogous geometric properties for a local minimizer of the TAP free
energy reachable by AMP, and show that posterior inference based on this
minimizer remains correctly calibrated.
( 2
min )
Image Segmentation is one of the core tasks in Computer Vision and solving it
often depends on modeling the image appearance data via the color distributions
of each it its constituent regions. Whereas many segmentation algorithms handle
the appearance models dependence using alternation or implicit methods, we
propose here a new approach to directly estimate them from the image without
prior information on the underlying segmentation. Our method uses local high
order color statistics from the image as an input to tensor factorization-based
estimator for latent variable models. This approach is able to estimate models
in multiregion images and automatically output the regions proportions without
prior user interaction, overcoming the drawbacks from a prior attempt to this
problem. We also demonstrate the performance of our proposed method in many
challenging synthetic and real imaging scenarios and show that it leads to an
efficient segmentation algorithm.
( 2
min )
We consider a deep neural network estimator based on empirical risk
minimization with l_1-regularization. We derive a general bound for its excess
risk in regression and classification (including multiclass), and prove that it
is adaptively nearly-minimax (up to log-factors) simultaneously across the
entire range of various function classes.
( 2
min )
We study the training of deep neural networks by gradient descent where
floating-point arithmetic is used to compute the gradients. In this framework
and under realistic assumptions, we demonstrate that it is highly unlikely to
find ReLU neural networks that maintain, in the course of training with
gradient descent, superlinearly many affine pieces with respect to their number
of layers. In virtually all approximation theoretical arguments that yield
high-order polynomial rates of approximation, sequences of ReLU neural networks
with exponentially many affine pieces compared to their numbers of layers are
used. As a consequence, we conclude that approximating sequences of ReLU neural
networks resulting from gradient descent in practice differ substantially from
theoretically constructed sequences. The assumptions and the theoretical
results are compared to a numerical study, which yields concurring results.
( 2
min )
How do you train an AI to understand clinical language with less clinical data? Train another AI to synthesize training data. Artificial intelligence is changing the way medicine is done, and is increasingly being used in all sorts of clinical tasks. This is fueled by generative AI and models like GatorTronGPT, a generative language model Read article >
( 5
min )
Human analysts can no longer effectively defend against the increasing speed and complexity of cybersecurity attacks. The amount of data is simply too large to screen manually. Generative AI, the most transformative tool of our time, enables a kind of digital jiu jitsu. It lets companies shift the force of data that threatens to overwhelm Read article >
( 6
min )
3D artists can improve the productivity and efficiency of their generative AI-enabled content-creation workflows thanks to the latest updates to popular OpenUSD software.
( 7
min )
The fastest way to give the gift of cloud gaming starts this GFN Thursday: For a limited time, every six-month GeForce NOW Ultimate membership includes three months of PC Game Pass. Also, the newest GeForce NOW app update is rolling out to members, including Xbox Game Syncing and more improvements. Plus, take advantage of a Read article >
( 7
min )
This is a joint blog with AWS and Philips. Philips is a health technology company focused on improving people’s lives through meaningful innovation. Since 2014, the company has been offering customers its Philips HealthSuite Platform, which orchestrates dozens of AWS services that healthcare and life sciences companies use to improve patient care. It partners with […]
( 15
min )
Whisper is an Automatic Speech Recognition (ASR) model that has been trained using 680,000 hours of supervised data from the web, encompassing a range of languages and tasks. One of its limitations is the low-performance on low-resource languages such as Marathi language and Dravidian languages, which can be remediated with fine-tuning. However, fine-tuning a Whisper […]
( 7
min )
Determining the value of housing is a classic example of using machine learning (ML). In this post, we discuss the use of an open-source model specifically designed for the task of visual question answering (VQA). With VQA, you can ask a question of a photo using natural language and receive an answer to your question—also in plain language. Our goal in this post is to inspire and demonstrate what is possible using this technology.
( 12
min )
With the PockEngine training method, machine-learning models can efficiently and continuously learn from user data on edge devices like smartphones.
( 10
min )
Several recent works have studied the convergence \textit{in high
probability} of stochastic gradient descent (SGD) and its clipped variant.
Compared to vanilla SGD, clipped SGD is practically more stable and has the
additional theoretical benefit of logarithmic dependence on the failure
probability. However, the convergence of other practical nonlinear variants of
SGD, e.g., sign SGD, quantized SGD and normalized SGD, that achieve improved
communication efficiency or accelerated convergence is much less understood. In
this work, we study the convergence bounds \textit{in high probability} of a
broad class of nonlinear SGD methods. For strongly convex loss functions with
Lipschitz continuous gradients, we prove a logarithmic dependence on the
failure probability, even when the noise is heavy-tailed. Strictly more general
than the results for clipped SGD, our results hold for any nonlinearity with
bounded (component-wise or joint) outputs, such as clipping, normalization, and
quantization. Further, existing results with heavy-tailed noise assume bounded
$\eta$-th central moments, with $\eta \in (1,2]$. In contrast, our refined
analysis works even for $\eta=1$, strictly relaxing the noise moment
assumptions in the literature.
( 2
min )
This paper presents a policy parameterization for learning-based control on
nonlinear, partially-observed dynamical systems. The parameterization is based
on a nonlinear version of the Youla parameterization and the recently proposed
Recurrent Equilibrium Network (REN) class of models. We prove that the
resulting Youla-REN parameterization automatically satisfies stability
(contraction) and user-tunable robustness (Lipschitz) conditions on the
closed-loop system. This means it can be used for safe learning-based control
with no additional constraints or projections required to enforce stability or
robustness. We test the new policy class in simulation on two reinforcement
learning tasks: 1) magnetic suspension, and 2) inverting a rotary-arm pendulum.
We find that the Youla-REN performs similarly to existing learning-based and
optimal control methods while also ensuring stability and exhibiting improved
robustness to adversarial disturbances.
( 2
min )
Determining, understanding, and predicting the so-called structure-property
relation is an important task in many scientific disciplines, such as
chemistry, biology, meteorology, physics, engineering, and materials science.
Structure refers to the spatial distribution of, e.g., substances, material, or
matter in general, while property is a resulting characteristic that usually
depends in a non-trivial way on spatial details of the structure.
Traditionally, forward simulations models have been used for such tasks.
Recently, several machine learning algorithms have been applied in these
scientific fields to enhance and accelerate simulation models or as surrogate
models. In this work, we develop and investigate the applications of six
machine learning techniques based on two different datasets from the domain of
materials science: data from a two-dimensional Ising model for predicting the
formation of magnetic domains and data representing the evolution of dual-phase
microstructures from the Cahn-Hilliard model. We analyze the accuracy and
robustness of all models and elucidate the reasons for the differences in their
performances. The impact of including domain knowledge through tailored
features is studied, and general recommendations based on the availability and
quality of training data are derived from this.
( 2
min )
Partial monitoring is an expressive framework for sequential decision-making
with an abundance of applications, including graph-structured and dueling
bandits, dynamic pricing and transductive feedback models. We survey and extend
recent results on the linear formulation of partial monitoring that naturally
generalizes the standard linear bandit setting. The main result is that a
single algorithm, information-directed sampling (IDS), is (nearly) worst-case
rate optimal in all finite-action games. We present a simple and unified
analysis of stochastic partial monitoring, and further extend the model to the
contextual and kernelized setting.
( 2
min )
Slip and crumple detection is essential for performing robust manipulation
tasks with a robotic hand (RH) like remote surgery. It has been one of the
challenging problems in the robotics manipulation community. In this work, we
propose a technique based on machine learning (ML) based techniques to detect
the slip, and crumple as well as the shape of an object that is currently held
in the robotic hand. We proposed ML model will detect the slip, crumple, and
shape using the force/torque exerted and the angular positions of the actuators
present in the RH. The proposed model would be integrated into the loop of a
robotic hand(RH) and haptic glove(HG). This would help us to reduce the latency
in case of teleoperation
( 2
min )
Large Language Models (LLMs) have demonstrated superior performance in
language understanding benchmarks. CALM, a popular approach, leverages
linguistic priors of LLMs -- GPT-2 -- for action candidate recommendations to
improve the performance in text games in Jericho without environment-provided
actions. However, CALM adapts GPT-2 with annotated human gameplays and keeps
the LLM fixed during the learning of the text based games. In this work, we
explore and evaluate updating LLM used for candidate recommendation during the
learning of the text based game as well to mitigate the reliance on the human
annotated gameplays, which are costly to acquire. We observe that by updating
the LLM during learning using carefully selected in-game transitions, we can
reduce the dependency on using human annotated game plays for fine-tuning the
LLMs. We conducted further analysis to study the transferability of the updated
LLMs and observed that transferring in-game trained models to other games did
not result in a consistent transfer.
( 2
min )
The ability to interpret spoken language is connected to natural language
processing. It involves teaching the AI how words relate to one another, how
they are meant to be used, and in what settings. The goal of natural language
processing (NLP) is to get a machine intelligence to process words the same way
a human brain does. This enables machine intelligence to interpret, arrange,
and comprehend textual data by processing the natural language. The technology
can comprehend what is communicated, whether it be through speech or writing
because AI pro-cesses language more quickly than humans can. In the present
study, five NLP algorithms, namely, Geneism, Sumy, Luhn, Latent Semantic
Analysis (LSA), and Kull-back-Liebler (KL) al-gorithm, are implemented for the
first time for the knowledge summarization purpose of the High Entropy Alloys
(HEAs). The performance prediction of these algorithms is made by using the
BLEU score and ROUGE score. The results showed that the Luhn algorithm has the
highest accuracy score for the knowledge summarization tasks compared to the
other used algorithms.
( 2
min )
This study presents a physics-informed machine learning-based control method
for nonlinear dynamic systems with highly noisy measurements. Existing
data-driven control methods that use machine learning for system identification
cannot effectively cope with highly noisy measurements, resulting in unstable
control performance. To address this challenge, the present study extends
current physics-informed machine learning capabilities for modeling nonlinear
dynamics with control and integrates them into a model predictive control
framework. To demonstrate the capability of the proposed method we test and
validate with two noisy nonlinear dynamic systems: the chaotic Lorenz 3 system,
and turning machine tool. Analysis of the results illustrate that the proposed
method outperforms state-of-the-art benchmarks as measured by both modeling
accuracy and control performance for nonlinear dynamic systems under high-noise
conditions.
( 2
min )
The industry of quantum technologies is rapidly expanding, offering promising
opportunities for various scientific domains. Among these emerging
technologies, Quantum Machine Learning (QML) has attracted considerable
attention due to its potential to revolutionize data processing and analysis.
In this paper, we investigate the application of QML in the field of remote
sensing. It is believed that QML can provide valuable insights for analysis of
data from space. We delve into the common beliefs surrounding the quantum
advantage in QML for remote sensing and highlight the open challenges that need
to be addressed. To shed light on the challenges, we conduct a study focused on
the problem of kernel value concentration, a phenomenon that adversely affects
the runtime of quantum computers. Our findings indicate that while this issue
negatively impacts quantum computer performance, it does not entirely negate
the potential quantum advantage in QML for remote sensing.
( 2
min )
This paper proposes a metric to measure the dissimilarity between graphs that
may have a different number of nodes. The proposed metric extends the
generalised optimal subpattern assignment (GOSPA) metric, which is a metric for
sets, to graphs. The proposed graph GOSPA metric includes costs associated with
node attribute errors for properly assigned nodes, missed and false nodes and
edge mismatches between graphs. The computation of this metric is based on
finding the optimal assignments between nodes in the two graphs, with the
possibility of leaving some of the nodes unassigned. We also propose a lower
bound for the metric, which is also a metric for graphs and is computable in
polynomial time using linear programming. The metric is first derived for
undirected unweighted graphs and it is then extended to directed and weighted
graphs. The properties of the metric are demonstrated via simulated and
empirical datasets.
( 2
min )
Despite the significant interest and progress in reinforcement learning (RL)
problems with adversarial corruption, current works are either confined to the
linear setting or lead to an undesired $\tilde{O}(\sqrt{T}\zeta)$ regret bound,
where $T$ is the number of rounds and $\zeta$ is the total amount of
corruption. In this paper, we consider the contextual bandit with general
function approximation and propose a computationally efficient algorithm to
achieve a regret of $\tilde{O}(\sqrt{T}+\zeta)$. The proposed algorithm relies
on the recently developed uncertainty-weighted least-squares regression from
linear contextual bandit and a new weighted estimator of uncertainty for the
general function class. In contrast to the existing analysis that heavily
relies on the linear structure, we develop a novel technique to control the sum
of weighted uncertainty, thus establishing the final regret bounds. We then
generalize our algorithm to the episodic MDP setting and first achieve an
additive dependence on the corruption level $\zeta$ in the scenario of general
function approximation. Notably, our algorithms achieve regret bounds either
nearly match the performance lower bound or improve the existing methods for
all the corruption levels and in both known and unknown $\zeta$ cases.
( 2
min )
Partial monitoring is an expressive framework for sequential decision-making
with an abundance of applications, including graph-structured and dueling
bandits, dynamic pricing and transductive feedback models. We survey and extend
recent results on the linear formulation of partial monitoring that naturally
generalizes the standard linear bandit setting. The main result is that a
single algorithm, information-directed sampling (IDS), is (nearly) worst-case
rate optimal in all finite-action games. We present a simple and unified
analysis of stochastic partial monitoring, and further extend the model to the
contextual and kernelized setting.
( 2
min )
Vision Transformers (ViTs) with self-attention modules have recently achieved
great empirical success in many vision tasks. Due to non-convex interactions
across layers, however, theoretical learning and generalization analysis is
mostly elusive. Based on a data model characterizing both label-relevant and
label-irrelevant tokens, this paper provides the first theoretical analysis of
training a shallow ViT, i.e., one self-attention layer followed by a two-layer
perceptron, for a classification task. We characterize the sample complexity to
achieve a zero generalization error. Our sample complexity bound is positively
correlated with the inverse of the fraction of label-relevant tokens, the token
noise level, and the initial model error. We also prove that a training process
using stochastic gradient descent (SGD) leads to a sparse attention map, which
is a formal verification of the general intuition about the success of
attention. Moreover, this paper indicates that a proper token sparsification
can improve the test performance by removing label-irrelevant and/or noisy
tokens, including spurious correlations. Empirical experiments on synthetic
data and CIFAR-10 dataset justify our theoretical results and generalize to
deeper ViTs.
( 2
min )
Approximate inference methods like the Laplace method, Laplace approximations
and variational methods, amongst others, are popular methods when exact
inference is not feasible due to the complexity of the model or the abundance
of data. In this paper we propose a hybrid approximate method called Low-Rank
Variational Bayes correction (VBC), that uses the Laplace method and
subsequently a Variational Bayes correction in a lower dimension, to the joint
posterior mean. The cost is essentially that of the Laplace method which
ensures scalability of the method, in both model complexity and data size.
Models with fixed and unknown hyperparameters are considered, for simulated and
real examples, for small and large datasets.
( 2
min )
Several recent works have studied the convergence \textit{in high
probability} of stochastic gradient descent (SGD) and its clipped variant.
Compared to vanilla SGD, clipped SGD is practically more stable and has the
additional theoretical benefit of logarithmic dependence on the failure
probability. However, the convergence of other practical nonlinear variants of
SGD, e.g., sign SGD, quantized SGD and normalized SGD, that achieve improved
communication efficiency or accelerated convergence is much less understood. In
this work, we study the convergence bounds \textit{in high probability} of a
broad class of nonlinear SGD methods. For strongly convex loss functions with
Lipschitz continuous gradients, we prove a logarithmic dependence on the
failure probability, even when the noise is heavy-tailed. Strictly more general
than the results for clipped SGD, our results hold for any nonlinearity with
bounded (component-wise or joint) outputs, such as clipping, normalization, and
quantization. Further, existing results with heavy-tailed noise assume bounded
$\eta$-th central moments, with $\eta \in (1,2]$. In contrast, our refined
analysis works even for $\eta=1$, strictly relaxing the noise moment
assumptions in the literature.
( 2
min )
We extend PAC-Bayesian theory to generative models and develop generalization
bounds for models based on the Wasserstein distance and the total variation
distance. Our first result on the Wasserstein distance assumes the instance
space is bounded, while our second result takes advantage of dimensionality
reduction. Our results naturally apply to Wasserstein GANs and Energy-Based
GANs, and our bounds provide new training objectives for these two. Although
our work is mainly theoretical, we perform numerical experiments showing
non-vacuous generalization bounds for Wasserstein GANs on synthetic datasets.
( 2
min )
We derive and study time-uniform confidence spheres - termed confidence
sphere sequences (CSSs) - which contain the mean of random vectors with high
probability simultaneously across all sample sizes. Inspired by the original
work of Catoni and Giulini, we unify and extend their analysis to cover both
the sequential setting and to handle a variety of distributional assumptions.
More concretely, our results include an empirical-Bernstein CSS for bounded
random vectors (resulting in a novel empirical-Bernstein confidence interval),
a CSS for sub-$\psi$ random vectors, and a CSS for heavy-tailed random vectors
based on a sequentially valid Catoni-Giulini estimator. Finally, we provide a
version of our empirical-Bernstein CSS that is robust to contamination by Huber
noise.
( 2
min )
Randomized experiments are a powerful methodology for data-driven evaluation
of decisions or interventions. Yet, their validity may be undermined by network
interference. This occurs when the treatment of one unit impacts not only its
outcome but also that of connected units, biasing traditional treatment effect
estimations. Our study introduces a new framework to accommodate complex and
unknown network interference, moving beyond specialized models in the existing
literature. Our framework, which we term causal message-passing, is grounded in
a high-dimensional approximate message passing methodology and is specifically
tailored to experimental design settings with prevalent network interference.
Utilizing causal message-passing, we present a practical algorithm for
estimating the total treatment effect and demonstrate its efficacy in four
numerical scenarios, each with its unique interference structure.
( 2
min )
A mixture of multivariate Poisson-log normal factor analyzers is introduced
by imposing constraints on the covariance matrix, which resulted in flexible
models for clustering purposes. In particular, a class of eight parsimonious
mixture models based on the mixtures of factor analyzers model are introduced.
Variational Gaussian approximation is used for parameter estimation, and
information criteria are used for model selection. The proposed models are
explored in the context of clustering discrete data arising from RNA sequencing
studies. Using real and simulated data, the models are shown to give favourable
clustering performance. The GitHub R package for this work is available at
https://github.com/anjalisilva/mixMPLNFA and is released under the open-source
MIT license.
( 2
min )
We provide the first useful, rigorous analysis of ensemble sampling for the
stochastic linear bandit setting. In particular, we show that, under standard
assumptions, for a $d$-dimensional stochastic linear bandit with an interaction
horizon $T$, ensemble sampling with an ensemble of size $m$ on the order of $d
\log T$ incurs regret bounded by order $(d \log T)^{5/2} \sqrt{T}$. Ours is the
first result in any structured setting not to require the size of the ensemble
to scale linearly with $T$ -- which defeats the purpose of ensemble sampling --
while obtaining near $\sqrt{T}$ order regret. Ours is also the first result
that allows infinite action sets.
( 2
min )
Feedforward neural networks (FNNs) are typically viewed as pure prediction
algorithms, and their strong predictive performance has led to their use in
many machine-learning applications. However, their flexibility comes with an
interpretability trade-off; thus, FNNs have been historically less popular
among statisticians. Nevertheless, classical statistical theory, such as
significance testing and uncertainty quantification, is still relevant.
Supplementing FNNs with methods of statistical inference, and covariate-effect
visualisations, can shift the focus away from black-box prediction and make
FNNs more akin to traditional statistical models. This can allow for more
inferential analysis, and, hence, make FNNs more accessible within the
statistical-modelling context.
( 2
min )
Over the last decades, the family of $\alpha$-stale distributions has proven
to be useful for modelling in telecommunication systems. Particularly, in the
case of radar applications, finding a fast and accurate estimation for the
amplitude density function parameters appears to be very important. In this
work, the maximum likelihood estimator (MLE) is proposed for parameters of the
amplitude distribution. To do this, the amplitude data are \emph{projected} on
the horizontal and vertical axes using two simple transformations. It is proved
that the \emph{projected} data follow a zero-location symmetric $\alpha$-stale
distribution for which the MLE can be computed quite fast. The average of
computed MLEs based on two \emph{projections} is considered as estimator for
parameters of the amplitude distribution. Performance of the proposed
\emph{projection} method is demonstrated through simulation study and analysis
of two sets of real radar data.
( 2
min )
AutoML allows you to derive rapid, general insights from your data right at the beginning of a machine learning (ML) project lifecycle. Understanding up front which preprocessing techniques and algorithm types provide best results reduces the time to develop, train, and deploy the right model. It plays a crucial role in every model’s development process […]
( 14
min )
Llama 2 stands at the forefront of AI innovation, embodying an advanced auto-regressive language model developed on a sophisticated transformer foundation. It’s tailored to address a multitude of applications in both the commercial and research domains with English as the primary linguistic concentration. Its model parameters scale from an impressive 7 billion to a remarkable […]
( 18
min )
An established financial services firm with over 140 years in business, Principal is a global investment management leader and serves more than 62 million customers around the world. Principal is conducting enterprise-scale near-real-time analytics to deliver a seamless and hyper-personalized omnichannel customer experience on their mission to make financial security accessible for all. They are […]
( 10
min )
Prompt engineering has become an essential skill for anyone working with large language models (LLMs) to generate high-quality and relevant texts. Although text prompt engineering has been widely discussed, visual prompt engineering is an emerging field that requires attention. Visual prompts can include bounding boxes or masks that guide vision models in generating relevant and […]
( 13
min )
The telecommunications industry — the backbone of today’s interconnected world — is valued at a staggering $1.7 trillion globally, according to IDC. It’s a massive operation, as telcos process hundreds of petabytes of data in their networks each day. That magnitude is only increasing, as the total amount of data transacted globally is forecast to Read article >
( 6
min )
Automotive companies are transforming every phase of their product lifecycle — evolving their primarily physical, manual processes into software-driven, AI-enhanced digital systems. To help them save costs and reduce lead times, NVIDIA is announcing two new simulation engines on Omniverse Cloud: the virtual factory simulation engine and the autonomous vehicle (AV) simulation engine. Omniverse Cloud, Read article >
( 6
min )
As NVIDIA continues to collaborate with Microsoft to build state-of-the-art AI infrastructure, Microsoft is introducing additional H100-based virtual machines to Microsoft Azure to accelerate demanding AI workloads. At its Ignite conference in Seattle today, Microsoft announced its new NC H100 v5 VM series for Azure, the industry’s first cloud instances featuring NVIDIA H100 NVL GPUs. Read article >
( 5
min )
Today’s landscape of free, open-source large language models (LLMs) is like an all-you-can-eat buffet for enterprises. This abundance can be overwhelming for developers building custom generative AI applications, as they need to navigate unique project and business requirements, including compatibility, security and the data used to train the models. NVIDIA AI Foundation Models — a Read article >
( 5
min )
Artificial intelligence on Windows 11 PCs marks a pivotal moment in tech history, revolutionizing experiences for gamers, creators, streamers, office workers, students and even casual PC users. It offers unprecedented opportunities to enhance productivity for users of the more than 100 million Windows PCs and workstations that are powered by RTX GPUs. And NVIDIA RTX Read article >
( 7
min )
Computer vision enables contact-free 3D printing, letting engineers print with high-performance materials they couldn’t use before.
( 11
min )
The usage of Lithium-ion (Li-ion) batteries has gained widespread popularity
across various industries, from powering portable electronic devices to
propelling electric vehicles and supporting energy storage systems. A central
challenge in Li-ion battery reliability lies in accurately predicting their
Remaining Useful Life (RUL), which is a critical measure for proactive
maintenance and predictive analytics. This study presents a novel approach that
harnesses the power of multiple denoising modules, each trained to address
specific types of noise commonly encountered in battery data. Specifically, a
denoising auto-encoder and a wavelet denoiser are used to generate
encoded/decomposed representations, which are subsequently processed through
dedicated self-attention transformer encoders. After extensive experimentation
on NASA and CALCE data, a broad spectrum of health indicator values are
estimated under a set of diverse noise patterns. The reported error metrics on
these data are on par with or better than the state-of-the-art reported in
recent literature.
( 2
min )
Air quality forecasting has garnered significant attention recently, with
data-driven models taking center stage due to advancements in machine learning
and deep learning models. However, researchers face challenges with complex
data acquisition and the lack of open-sourced datasets, hindering efficient
model validation. This paper introduces PurpleAirSF, a comprehensive and easily
accessible dataset collected from the PurpleAir network. With its high temporal
resolution, various air quality measures, and diverse geographical coverage,
this dataset serves as a useful tool for researchers aiming to develop novel
forecasting models, study air pollution patterns, and investigate their impacts
on health and the environment. We present a detailed account of the data
collection and processing methods employed to build PurpleAirSF. Furthermore,
we conduct preliminary experiments using both classic and modern
spatio-temporal forecasting models, thereby establishing a benchmark for future
air quality forecasting tasks.
( 2
min )
Commonsense norms are defeasible by context: reading books is usually great,
but not when driving a car. While contexts can be explicitly described in
language, in embodied scenarios, contexts are often provided visually. This
type of visually grounded reasoning about defeasible commonsense norms is
generally easy for humans, but (as we show) poses a challenge for machines, as
it necessitates both visual understanding and reasoning about commonsense
norms. We construct a new multimodal benchmark for studying visual-grounded
commonsense norms: NORMLENS. NORMLENS consists of 10K human judgments
accompanied by free-form explanations covering 2K multimodal situations, and
serves as a probe to address two questions: (1) to what extent can models align
with average human judgment? and (2) how well can models explain their
predicted judgments? We find that state-of-the-art model judgments and
explanations are not well-aligned with human annotation. Additionally, we
present a new approach to better align models with humans by distilling social
commonsense knowledge from large language models. The data and code are
released at https://seungjuhan.me/normlens.
( 3
min )
This paper presents a novel fast machine learning method that leverages two
techniques: Vector Embedding on Orthonormal Basis (VEOB) and Spectral Transform
(ST). The VEOB converts the original data encoding into a vector embedding with
coordinates projected onto orthonormal bases. The Singular Value Decomposition
(SVD) technique is used to calculate the vector basis and projection
coordinates, leading to an enhanced distance measurement in the embedding space
and facilitating data compression by preserving the projection vectors
associated with the largest singular values. On the other hand, ST transforms
sequence of vector data into spectral space. By applying the Discrete Cosine
Transform (DCT) and selecting the most significant components, it streamlines
the handling of lengthy vector sequences. The paper provides examples of word
embedding, text chunk embedding, and image embedding, implemented in Julia
language with a vector database. It also investigates unsupervised learning and
supervised learning using this method, along with strategies for handling large
data volumes.
( 2
min )
We present and experimentally evaluate using transfer learning to address
experimental data scarcity when training neural network (NN) models for
Mach-Zehnder interferometer mesh-based optical matrix multipliers. Our approach
involves pre-training the model using synthetic data generated from a less
accurate analytical model and fine-tuning with experimental data. Our
investigation demonstrates that this method yields significant reductions in
modeling errors compared to using an analytical model, or a standalone NN model
when training data is limited. Utilizing regularization techniques and ensemble
averaging, we achieve < 1 dB root-mean-square error on the matrix weights
implemented by a 3x3 photonic chip while using only 25% of the available data.
( 2
min )
We introduce a value-based RL agent, which we call BBF, that achieves
super-human performance in the Atari 100K benchmark. BBF relies on scaling the
neural networks used for value estimation, as well as a number of other design
choices that enable this scaling in a sample-efficient manner. We conduct
extensive analyses of these design choices and provide insights for future
work. We end with a discussion about updating the goalposts for
sample-efficient RL research on the ALE. We make our code and data publicly
available at
https://github.com/google-research/google-research/tree/master/bigger_better_faster.
( 2
min )
For a specific class of sparse Gaussian graphical models, we provide a
closed-form solution for the determinant of the covariance matrix. In our
framework, the graphical interaction model (i.e., the covariance selection
model) is equal to replacement product of $\mathcal{K}_{n}$ and
$\mathcal{K}_{n-1}$, where $\mathcal{K}_n$ is the complete graph with $n$
vertices. Our analysis is based on taking the Fourier transform of the local
factors of the model, which can be viewed as an application of the Normal
Factor Graph Duality Theorem and holographic algorithms. The closed-form
expression is obtained by applying the Matrix Determinant Lemma on the
transformed graphical model. In this context, we will also define a notion of
equivalence between two Gaussian graphical models.
( 2
min )
Bitcoin as a cryptocurrency has been one of the most important digital coins
and the first decentralized digital currency. Deep neural networks, on the
other hand, has shown promising results recently; however, we require huge
amount of high-quality data to leverage their power. There are some techniques
such as augmentation that can help us with increasing the dataset size, but we
cannot exploit them on historical bitcoin data. As a result, we propose a
shallow Bidirectional-LSTM (Bi-LSTM) model, fed with feature engineered data
using our proposed method to forecast bitcoin closing prices in a daily time
frame. We compare the performance with that of other forecasting methods, and
show that with the help of the proposed feature engineering method, a shallow
deep neural network outperforms other popular price forecasting models.
( 2
min )
Driver stress is a major cause of car accidents and death worldwide.
Furthermore, persistent stress is a health problem, contributing to
hypertension and other diseases of the cardiovascular system. Stress has a
measurable impact on heart and breathing rates and stress levels can be
inferred from such measurements. Galvanic skin response is a common test to
measure the perspiration caused by both physiological and psychological stress,
as well as extreme emotions. In this paper, galvanic skin response is used to
estimate the ground truth stress levels. A feature selection technique based on
the minimal redundancy-maximal relevance method is then applied to multiple
heart rate variability and breathing rate metrics to identify a novel and
optimal combination for use in detecting stress. The support vector machine
algorithm with a radial basis function kernel was used along with these
features to reliably predict stress. The proposed method has achieved a high
level of accuracy on the target dataset.
( 2
min )
Diagonal linear networks (DLNs) are a toy simplification of artificial neural
networks; they consist in a quadratic reparametrization of linear regression
inducing a sparse implicit regularization. In this paper, we describe the
trajectory of the gradient flow of DLNs in the limit of small initialization.
We show that incremental learning is effectively performed in the limit:
coordinates are successively activated, while the iterate is the minimizer of
the loss constrained to have support on the active coordinates only. This shows
that the sparse implicit regularization of DLNs decreases with time. This work
is restricted to the underparametrized regime with anti-correlated features for
technical reasons.
( 2
min )
Path reasoning methods over knowledge graphs have gained popularity for their
potential to improve transparency in recommender systems. However, the
resulting models still rely on pre-trained knowledge graph embeddings, fail to
fully exploit the interdependence between entities and relations in the KG for
recommendation, and may generate inaccurate explanations. In this paper, we
introduce PEARLM, a novel approach that efficiently captures user behaviour and
product-side knowledge through language modelling. With our approach, knowledge
graph embeddings are directly learned from paths over the KG by the language
model, which also unifies entities and relations in the same optimisation
space. Constraints on the sequence decoding additionally guarantee path
faithfulness with respect to the KG. Experiments on two datasets show the
effectiveness of our approach compared to state-of-the-art baselines. Source
code and datasets: AVAILABLE AFTER GETTING ACCEPTED.
( 2
min )
To facilitate reliable deployments of autonomous robots in the real world,
Out-of-Distribution (OOD) detection capabilities are often required. A powerful
approach for OOD detection is based on density estimation with Normalizing
Flows (NFs). However, we find that prior work with NFs attempts to match the
complex target distribution topologically with naive base distributions leading
to adverse implications. In this work, we circumvent this topological mismatch
using an expressive class-conditional base distribution trained with an
information-theoretic objective to match the required topology. The proposed
method enjoys the merits of wide compatibility with existing learned models
without any performance degradation and minimum computation overhead while
enhancing OOD detection capabilities. We demonstrate superior results in
density estimation and 2D object detection benchmarks in comparison with
extensive baselines. Moreover, we showcase the applicability of the method with
a real-robot deployment.
( 2
min )
The Recommender system is a vital information service on today's Internet.
Recently, graph neural networks have emerged as the leading approach for
recommender systems. We try to review recent literature on graph neural
network-based recommender systems, covering the background and development of
both recommender systems and graph neural networks. Then categorizing
recommender systems by their settings and graph neural networks by spectral and
spatial models, we explore the motivation behind incorporating graph neural
networks into recommender systems. We also analyze challenges and open problems
in graph construction, embedding propagation and aggregation, and computation
efficiency. This guides us to better explore the future directions and
developments in this domain.
( 2
min )
Computer-assisted methods have emerged as valuable tools for retrosynthesis
analysis. However, quantifying the plausibility of generated retrosynthesis
routes remains a challenging task. We introduce Retro-BLEU, a statistical
metric adapted from the well-established BLEU score in machine translation, to
evaluate the plausibility of retrosynthesis routes based on reaction template
sequences analysis. We demonstrate the effectiveness of Retro-BLEU by applying
it to a diverse set of retrosynthesis routes generated by state-of-the-art
algorithms and compare the performance with other evaluation metrics. The
results show that Retro-BLEU is capable of differentiating between plausible
and implausible routes. Furthermore, we provide insights into the strengths and
weaknesses of Retro-BLEU, paving the way for future developments and
improvements in this field.
( 2
min )
We show how to compute the elements of a sequence $x_t = a_t x_{t-1} + b_t$
in parallel, given $t = (1, 2, \dots, n)$, $a_t \in \mathbb{R}^n$, $b_t \in
\mathbb{R}^n$, and initial value $x_0 \in \mathbb{R}$. On $n$ parallel
processors, the computation of $n$ elements incurs $\mathcal{O}(\log n)$ time
and $\mathcal{O}(n)$ space. Sequences of this form are ubiquitous in science
and engineering, making their parallelization useful for a vast number of
applications. We implement parallelization in software, test it on parallel
hardware, and verify that it executes faster than sequential computation by a
factor of $\frac{n}{\log n}$.
( 2
min )
ODTLearn is an open-source Python package that provides methods for learning
optimal decision trees for high-stakes predictive and prescriptive tasks based
on the mixed-integer optimization (MIO) framework proposed in Aghaei et al.
(2019) and several of its extensions. The current version of the package
provides implementations for learning optimal classification trees, optimal
fair classification trees, optimal classification trees robust to distribution
shifts, and optimal prescriptive trees from observational data. We have
designed the package to be easy to maintain and extend as new optimal decision
tree problem classes, reformulation strategies, and solution algorithms are
introduced. To this end, the package follows object-oriented design principles
and supports both commercial (Gurobi) and open source (COIN-OR branch and cut)
solvers. The package documentation and an extensive user guide can be found at
https://d3m-research-group.github.io/odtlearn/. Additionally, users can view
the package source code and submit feature requests and bug reports by visiting
https://github.com/D3M-Research-Group/odtlearn.
( 2
min )
We investigate the long-run behavior of single-server queues with Hawkes
arrivals and general service distributions and related optimization problems.
In detail, utilizing novel coupling techniques, we establish finite moment
bounds for the stationary distribution of the workload and busy period
processes. In addition, we are able to show that, those queueing processes
converge exponentially fast to their stationary distribution. Based on these
theoretic results, we develop an efficient numerical algorithm to solve the
optimal staffing problem for the Hawkes queues in a data-driven manner.
Numerical results indicate a sharp difference in staffing for Hawkes queues,
compared to the classic GI/GI/1 model, especially in the heavy-traffic regime.
( 2
min )
Gaussian Mixture Models (GMMs) are one of the most potent parametric density
models used extensively in many applications. Flexibly-tied factorization of
the covariance matrices in GMMs is a powerful approach for coping with the
challenges of common GMMs when faced with high-dimensional data and complex
densities which often demand a large number of Gaussian components. However,
the expectation-maximization algorithm for fitting flexibly-tied GMMs still
encounters difficulties with streaming and very large dimensional data. To
overcome these challenges, this paper suggests the use of first-order
stochastic optimization algorithms. Specifically, we propose a new stochastic
optimization algorithm on the manifold of orthogonal matrices. Through numerous
empirical results on both synthetic and real datasets, we observe that
stochastic optimization methods can outperform the expectation-maximization
algorithm in terms of attaining better likelihood, needing fewer epochs for
convergence, and consuming less time per each epoch.
( 2
min )
Recent works have shown that physics-inspired architectures allow the
training of deep graph neural networks (GNNs) without oversmoothing. The role
of these physics is unclear, however, with successful examples of both
reversible (e.g., Hamiltonian) and irreversible (e.g., diffusion) phenomena
producing comparable results despite diametrically opposed mechanisms, and
further complications arising due to empirical departures from mathematical
theory. This work presents a series of novel GNN architectures based upon
structure-preserving bracket-based dynamical systems, which are provably
guaranteed to either conserve energy or generate positive dissipation with
increasing depth. It is shown that the theoretically principled framework
employed here allows for inherently explainable constructions, which
contextualize departures from theory in current architectures and better
elucidate the roles of reversibility and irreversibility in network
performance.
( 2
min )
In recent years, language-driven artistic style transfer has emerged as a new
type of style transfer technique, eliminating the need for a reference style
image by using natural language descriptions of the style. The first model to
achieve this, called CLIPstyler, has demonstrated impressive stylisation
results. However, its lengthy optimisation procedure at runtime for each query
limits its suitability for many practical applications. In this work, we
present FastCLIPstyler, a generalised text-based image style transfer model
capable of stylising images in a single forward pass for arbitrary text inputs.
Furthermore, we introduce EdgeCLIPstyler, a lightweight model designed for
compatibility with resource-constrained devices. Through quantitative and
qualitative comparisons with state-of-the-art approaches, we demonstrate that
our models achieve superior stylisation quality based on measurable metrics
while offering significantly improved runtime efficiency, particularly on edge
devices.
( 2
min )
Associative memory architectures are designed for memorization but also
offer, through their retrieval method, a form of generalization to unseen
inputs: stored memories can be seen as prototypes from this point of view.
Focusing on Modern Hopfield Networks (MHN), we show that a large memorization
capacity undermines the generalization opportunity. We offer a solution to
better optimize this tradeoff. It relies on Minimum Description Length (MDL) to
determine during training which memories to store, as well as how many of them.
( 2
min )
We identify hidden layers inside a deep neural network (DNN) with group
actions on the data domain, and formulate a formal deep network as a dual voice
transform with respect to the Koopman operator, a linear representation of the
group action. Based on the group theoretic arguments, particularly by using
Schur's lemma, we show a simple proof of the universality of DNNs.
( 2
min )
Compared to "black-box" models, like random forests and deep neural networks,
explainable boosting machines (EBMs) are considered "glass-box" models that can
be competitively accurate while also maintaining a higher degree of
transparency and explainability. However, EBMs become readily less transparent
and harder to interpret in high-dimensional settings with many predictor
variables; they also become more difficult to use in production due to
increases in scoring time. We propose a simple solution based on the least
absolute shrinkage and selection operator (LASSO) that can help introduce
sparsity by reweighting the individual model terms and removing the less
relevant ones, thereby allowing these models to maintain their transparency and
relatively fast scoring times in higher-dimensional settings. In short,
post-processing a fitted EBM with many (i.e., possibly hundreds or thousands)
of terms using the LASSO can help reduce the model's complexity and drastically
improve scoring time. We illustrate the basic idea using two real-world
examples with code.
( 2
min )
We analyze geometric aspects of the gradient descent algorithm in Deep
Learning (DL) networks. In particular, we prove that the globally minimizing
weights and biases for the $\mathcal{L}^2$ cost obtained constructively in
[Chen-Munoz Ewald 2023] for underparametrized ReLU DL networks can generically
not be approximated via the gradient descent flow. We therefore conclude that
the method introduced in [Chen-Munoz Ewald 2023] is disjoint from the gradient
descent method.
( 2
min )
We derive the first large deviation rate function for the stochastic iterates
generated by policy gradient methods with a softmax parametrization and an
entropy regularized objective. Leveraging the contraction principle from large
deviations theory, we also develop a general recipe for deriving exponential
convergence rates for a wide spectrum of other policy parametrizations. This
approach unifies several results from the literature and simplifies existing
proof techniques.
( 2
min )
For a specific class of sparse Gaussian graphical models, we provide a
closed-form solution for the determinant of the covariance matrix. In our
framework, the graphical interaction model (i.e., the covariance selection
model) is equal to replacement product of $\mathcal{K}_{n}$ and
$\mathcal{K}_{n-1}$, where $\mathcal{K}_n$ is the complete graph with $n$
vertices. Our analysis is based on taking the Fourier transform of the local
factors of the model, which can be viewed as an application of the Normal
Factor Graph Duality Theorem and holographic algorithms. The closed-form
expression is obtained by applying the Matrix Determinant Lemma on the
transformed graphical model. In this context, we will also define a notion of
equivalence between two Gaussian graphical models.
( 2
min )
Learning nonparametric systems of Ordinary Differential Equations (ODEs) dot
x = f(t,x) from noisy data is an emerging machine learning topic. We use the
well-developed theory of Reproducing Kernel Hilbert Spaces (RKHS) to define
candidates for f for which the solution of the ODE exists and is unique.
Learning f consists of solving a constrained optimization problem in an RKHS.
We propose a penalty method that iteratively uses the Representer theorem and
Euler approximations to provide a numerical solution. We prove a generalization
bound for the L2 distance between x and its estimator and provide experimental
comparisons with the state-of-the-art.
( 2
min )
For the differential privacy under the sub-Gamma noise, we derive the
asymptotic properties of a class of network models with binary values with a
general link function. In this paper, we release the degree sequences of the
binary networks under a general noisy mechanism with the discrete Laplace
mechanism as a special case. We establish the asymptotic result including both
consistency and asymptotically normality of the parameter estimator when the
number of parameters goes to infinity in a class of network models. Simulations
and a real data example are provided to illustrate asymptotic results.
( 2
min )
ODTLearn is an open-source Python package that provides methods for learning
optimal decision trees for high-stakes predictive and prescriptive tasks based
on the mixed-integer optimization (MIO) framework proposed in Aghaei et al.
(2019) and several of its extensions. The current version of the package
provides implementations for learning optimal classification trees, optimal
fair classification trees, optimal classification trees robust to distribution
shifts, and optimal prescriptive trees from observational data. We have
designed the package to be easy to maintain and extend as new optimal decision
tree problem classes, reformulation strategies, and solution algorithms are
introduced. To this end, the package follows object-oriented design principles
and supports both commercial (Gurobi) and open source (COIN-OR branch and cut)
solvers. The package documentation and an extensive user guide can be found at
https://d3m-research-group.github.io/odtlearn/. Additionally, users can view
the package source code and submit feature requests and bug reports by visiting
https://github.com/D3M-Research-Group/odtlearn.
( 2
min )
Most prognostic methods require a decent amount of data for model training.
In reality, however, the amount of historical data owned by a single
organization might be small or not large enough to train a reliable prognostic
model. To address this challenge, this article proposes a federated prognostic
model that allows multiple users to jointly construct a failure time prediction
model using their multi-stream, high-dimensional, and incomplete data while
keeping each user's data local and confidential. The prognostic model first
employs multivariate functional principal component analysis to fuse the
multi-stream degradation signals. Then, the fused features coupled with the
times-to-failure are utilized to build a (log)-location-scale regression model
for failure prediction. To estimate parameters using distributed datasets and
keep the data privacy of all participants, we propose a new federated algorithm
for feature extraction. Numerical studies indicate that the performance of the
proposed model is the same as that of classic non-federated prognostic models
and is better than that of the models constructed by each user itself.
( 2
min )
This paper introduces $\textit{arfpy}$, a python implementation of
Adversarial Random Forests (ARF) (Watson et al., 2023), which is a lightweight
procedure for synthesizing new data that resembles some given data. The
software $\textit{arfpy}$ equips practitioners with straightforward
functionalities for both density estimation and generative modeling. The method
is particularly useful for tabular data and its competitive performance is
demonstrated in previous literature. As a major advantage over the mostly deep
learning based alternatives, $\textit{arfpy}$ combines the method's reduced
requirements in tuning efforts and computational resources with a user-friendly
python interface. This supplies audiences across scientific fields with
software to generate data effortlessly.
( 2
min )
We derive the existence of a new type of neural network, called a compact
matrix quantum group equivariant neural network, that learns from data that has
an underlying quantum symmetry. We apply the Woronowicz formulation of
Tannaka-Krein duality to characterise the weight matrices that appear in these
neural networks for any easy compact matrix quantum group. We show that compact
matrix quantum group equivariant neural networks contain, as a subclass, all
compact matrix group equivariant neural networks. Moreover, we obtain
characterisations of the weight matrices for many compact matrix group
equivariant neural networks that have not previously appeared in the machine
learning literature.
( 2
min )
Online communities are driving user engagement across industries like gaming, social media, ecommerce, dating, and e-learning. Members of these online communities trust platform owners to provide a safe and inclusive environment where they can freely consume content and contribute. Content moderators are often employed to review user-generated content and check that it’s safe and compliant […]
( 7
min )
Today, we are excited to announce the capability to fine-tune the Mistral 7B model using Amazon SageMaker JumpStart. You can now fine-tune and deploy Mistral text generation models on SageMaker JumpStart using the Amazon SageMaker Studio UI with a few clicks or using the SageMaker Python SDK. Foundation models perform very well with generative tasks, […]
( 10
min )
Fake news, defined as news that conveys or incorporates false, fabricated, or deliberately misleading information, has been around as early as the emergence of the printing press. The rapid spread of fake news and disinformation online is not only deceiving to the public, but can also have a profound impact on society, politics, economy, and […]
( 17
min )
In the era of big data and AI, companies are continually seeking ways to use these technologies to gain a competitive edge. One of the hottest areas in AI right now is generative AI, and for good reason. Generative AI offers powerful solutions that push the boundaries of what’s possible in terms of creativity and […]
( 13
min )
Data analysts must have a strong grasp of practical data visualization skills to paint a clear picture of complex data for a broader audience. Seeing the big picture by delivering coherent and easily comprehensible content is crucial. Companies highly value avant-garde data analysts who can not only dig into the data but also connect the… Read More »Beyond the numbers: The soft skills that elevate data analysts to the next level
The post Beyond the numbers: The soft skills that elevate data analysts to the next level appeared first on Data Science Central.
( 21
min )
Given how quickly the digital marketing industry changes, keeping up with the most recent trends and technologies can be challenging. Traditional marketing techniques are no longer enough to connect with and engage your target audience. Machine learning in marketing has got your back. Check out the top 10 use cases and implementation pointers that offer… Read More »Machine learning in marketing: 10 use cases and implementation tips
The post Machine learning in marketing: 10 use cases and implementation tips appeared first on Data Science Central.
( 24
min )
In the last two years, I published 5 machine learning and AI books, including one on synthetic data by Elsevier. This represents over 800 pages of compact, state-of-the-art material. The new addition features my most recent advances: the problems that I encountered with generative adversarial networks, and how I overcome them with new techniques. The… Read More »New Book: Statistical Optimization for GenAI and Machine Learning
The post New Book: Statistical Optimization for GenAI and Machine Learning appeared first on Data Science Central.
( 21
min )
Character animator Sir Wade Neistadt works to make animation and 3D education more accessible for aspiring and professional artists alike through video tutorials and industry training.
( 8
min )
Sparse matrix representations are ubiquitous in computational science and
machine learning, leading to significant reductions in compute time, in
comparison to dense representation, for problems that have local connectivity.
The adoption of sparse representation in leading ML frameworks such as PyTorch
is incomplete, however, with support for both automatic differentiation and GPU
acceleration missing. In this work, we present an implementation of a CSR-based
sparse matrix wrapper for PyTorch with CUDA acceleration for basic matrix
operations, as well as automatic differentiability. We also present several
applications of the resulting sparse kernels to optimization problems,
demonstrating ease of implementation and performance measurements versus their
dense counterparts.
( 2
min )
Deep learning-based vision is characterized by intricate frameworks that
often necessitate a profound understanding, presenting a barrier to newcomers
and limiting broad adoption. With many researchers grappling with the
constraints of smaller datasets, there's a pronounced reliance on pre-trained
neural networks, especially for tasks such as image classification. This
reliance is further intensified in niche imaging areas where obtaining vast
datasets is challenging. Despite the widespread use of transfer learning as a
remedy to the small dataset dilemma, a conspicuous absence of tailored auto-ML
solutions persists. Addressing these challenges is "Deep Fast Vision", a python
library that streamlines the deep learning process. This tool offers a
user-friendly experience, enabling results through a simple nested dictionary
definition, helping to democratize deep learning for non-experts. Designed for
simplicity and scalability, Deep Fast Vision appears as a bridge, connecting
the complexities of existing deep learning frameworks with the needs of a
diverse user base.
( 2
min )
We present the new Orthogonal Polynomials Approximation Algorithm (OPAA), a
parallelizable algorithm that solves two problems from a functional analytic
approach: first, it finds a smooth functional estimate of a density function,
whether it is normalized or not; second, the algorithm provides an estimate of
the normalizing weight. In the context of Bayesian inference, OPAA provides an
estimate of the posterior function as well as the normalizing weight, which is
also known as the evidence.
A core component of OPAA is a special transform of the square root of the
joint distribution into a special functional space of our construct. Through
this transform, the evidence is equated with the $L^2$ norm of the transformed
function, squared. Hence, the evidence can be estimated by the sum of squares
of the transform coefficients. The computations can be parallelized and
completed in one pass.
To compute the transform coefficients, OPAA proposes a new computational
scheme leveraging Gauss--Hermite quadrature in higher dimensions. Not only does
it avoid the potential high variance problem associated with random sampling
methods, it also enables one to speed up the computation by parallelization,
and significantly reduces the complexity by a vector decomposition.
( 2
min )
Recent advances have shown that GP priors, or their finite realisations, can
be encoded using deep generative models such as variational autoencoders
(VAEs). These learned generators can serve as drop-in replacements for the
original priors during MCMC inference. While this approach enables efficient
inference, it loses information about the hyperparameters of the original
models, and consequently makes inference over hyperparameters impossible and
the learned priors indistinct. To overcome this limitation, we condition the
VAE on stochastic process hyperparameters. This allows the joint encoding of
hyperparameters with GP realizations and their subsequent estimation during
inference. Further, we demonstrate that our proposed method, PriorCVAE, is
agnostic to the nature of the models which it approximates, and can be used,
for instance, to encode solutions of ODEs. It provides a practical tool for
approximate inference and shows potential in real-life spatial and
spatiotemporal applications.
( 2
min )
The simulation of power system dynamics poses a computationally expensive
task. Considering the growing uncertainty of generation and demand patterns,
thousands of scenarios need to be continuously assessed to ensure the safety of
power systems. Physics-Informed Neural Networks (PINNs) have recently emerged
as a promising solution for drastically accelerating computations of non-linear
dynamical systems. This work investigates the applicability of these methods
for power system dynamics, focusing on the dynamic response to load
disturbances. Comparing the prediction of PINNs to the solution of conventional
solvers, we find that PINNs can be 10 to 1000 times faster than conventional
solvers. At the same time, we find them to be sufficiently accurate and
numerically stable even for large time steps. To facilitate a deeper
understanding, this paper also present a new regularisation of Neural Network
(NN) training by introducing a gradient-based term in the loss function. The
resulting NNs, which we call dtNNs, help us deliver a comprehensive analysis
about the strengths and weaknesses of the NN based approaches, how
incorporating knowledge of the underlying physics affects NN performance, and
how this compares with conventional solvers for power system dynamics.
( 2
min )
The relentless pursuit of miniaturization and performance enhancement in
electronic devices has led to a fundamental challenge in the field of circuit
design and simulation: how to accurately account for the inherent stochastic
nature of certain devices. While conventional deterministic models have served
as indispensable tools for circuit designers, they fall short when it comes to
capture the subtle yet critical variability exhibited by many electronic
components. In this paper, we present an innovative approach that transcends
the limitations of traditional modeling techniques by harnessing the power of
machine learning, specifically Mixture Density Networks (MDNs), to faithfully
represent and simulate the stochastic behavior of electronic devices. We
demonstrate our approach to model heater cryotrons, where the model is able to
capture the stochastic switching dynamics observed in the experiment. Our model
shows 0.82% mean absolute error for switching probability. This paper marks a
significant step forward in the quest for accurate and versatile compact
models, poised to drive innovation in the realm of electronic circuits.
( 2
min )
We investigate how shallow ReLU networks interpolate between known regions.
Our analysis shows that empirical risk minimizers converge to a minimum norm
interpolant as the number of data points and parameters tends to infinity when
a weight decay regularizer is penalized with a coefficient which vanishes at a
precise rate as the network width and the number of data points grow. With and
without explicit regularization, we numerically study the implicit bias of
common optimization algorithms towards known minimum norm interpolants.
( 2
min )
A classic inferential statistical problem is the goodness-of-fit (GOF) test.
Such a test can be challenging when the hypothesized parametric model has an
intractable likelihood and its distributional form is not available. Bayesian
methods for GOF can be appealing due to their ability to incorporate expert
knowledge through prior distributions.
However, standard Bayesian methods for this test often require strong
distributional assumptions on the data and their relevant parameters. To
address this issue, we propose a semi-Bayesian nonparametric (semi-BNP)
procedure in the context of the maximum mean discrepancy (MMD) measure that can
be applied to the GOF test. Our method introduces a novel Bayesian estimator
for the MMD, enabling the development of a measure-based hypothesis test for
intractable models. Through extensive experiments, we demonstrate that our
proposed test outperforms frequentist MMD-based methods by achieving a lower
false rejection and acceptance rate of the null hypothesis. Furthermore, we
showcase the versatility of our approach by embedding the proposed estimator
within a generative adversarial network (GAN) framework. It facilitates a
robust BNP learning approach as another significant application of our method.
With our BNP procedure, this new GAN approach can enhance sample diversity and
improve inferential accuracy compared to traditional techniques.
( 3
min )
Real-time density estimation is ubiquitous in many applications, including
computer vision and signal processing. Kernel density estimation is arguably
one of the most commonly used density estimation techniques, and the use of
"sliding window" mechanism adapts kernel density estimators to dynamic
processes. In this paper, we derive the asymptotic mean integrated squared
error (AMISE) upper bound for the "sliding window" kernel density estimator.
This upper bound provides a principled guide to devise a novel estimator, which
we name the temporal adaptive kernel density estimator (TAKDE). Compared to
heuristic approaches for "sliding window" kernel density estimator, TAKDE is
theoretically optimal in terms of the worst-case AMISE. We provide numerical
experiments using synthetic and real-world datasets, showing that TAKDE
outperforms other state-of-the-art dynamic density estimators (including those
outside of kernel family). In particular, TAKDE achieves a superior test
log-likelihood with a smaller runtime.
( 2
min )
Forecasting healthcare time series is crucial for early detection of adverse
outcomes and for patient monitoring. Forecasting, however, can be difficult in
practice due to noisy and intermittent data. The challenges are often
exacerbated by change points induced via extrinsic factors, such as the
administration of medication. To address these challenges, we propose a novel
hybrid global-local architecture and a pharmacokinetic encoder that informs
deep learning models of patient-specific treatment effects. We showcase the
efficacy of our approach in achieving significant accuracy gains for a blood
glucose forecasting task using both realistically simulated and real-world
data. Our global-local architecture improves over patient-specific models by
9.2-14.6%. Additionally, our pharmacokinetic encoder improves over alternative
encoding techniques by 4.4% on simulated data and 2.1% on real-world data. The
proposed approach can have multiple beneficial applications in clinical
practice, such as issuing early warnings about unexpected treatment responses,
or helping to characterize patient-specific treatment effects in terms of drug
absorption and elimination characteristics.
( 2
min )
Progressing towards a new era of Artificial Intelligence (AI) - enabled
wireless networks, concerns regarding the environmental impact of AI have been
raised both in industry and academia. Federated Learning (FL) has emerged as a
key privacy preserving decentralized AI technique. Despite efforts currently
being made in FL, its environmental impact is still an open problem. Targeting
the minimization of the overall energy consumption of an FL process, we propose
the orchestration of computational and communication resources of the involved
devices to minimize the total energy required, while guaranteeing a certain
performance of the model. To this end, we propose a Soft Actor Critic Deep
Reinforcement Learning (DRL) solution, where a penalty function is introduced
during training, penalizing the strategies that violate the constraints of the
environment, and contributing towards a safe RL process. A device level
synchronization method, along with a computationally cost effective FL
environment are proposed, with the goal of further reducing the energy
consumption and communication overhead. Evaluation results show the
effectiveness and robustness of the proposed scheme compared to four
state-of-the-art baseline solutions on different network environments and FL
architectures, achieving a decrease of up to 94% in the total energy
consumption.
( 3
min )
Early-exit neural networks (EENNs) facilitate adaptive inference by producing
predictions at multiple stages of the forward pass. In safety-critical
applications, these predictions are only meaningful when complemented with
reliable uncertainty estimates. Yet, due to their sequential structure, an
EENN's uncertainty estimates should also be consistent: labels that are deemed
improbable at one exit should not reappear within the confidence interval / set
of later exits. We show that standard uncertainty quantification techniques,
like Bayesian methods or conformal prediction, can lead to inconsistency across
exits. We address this problem by applying anytime-valid confidence sequences
(AVCSs) to the exits of EENNs. By design, AVCSs maintain consistency across
exits. We examine the theoretical and practical challenges of applying AVCSs to
EENNs and empirically validate our approach on both regression and
classification tasks.
( 2
min )
How do language models deal with the limited bandwidth of the residual
stream? Prior work has suggested that some attention heads and MLP layers may
perform a "memory management" role. That is, clearing residual stream
directions set by earlier layers by reading in information and writing out the
negative version. In this work, we present concrete evidence for this
phenomenon in a 4-layer transformer. We identify several heads in layer 2 that
consistently remove the output of a single layer 0 head. We then verify that
this erasure causally depends on the original written direction. We further
demonstrate that direct logit attribution (DLA) suggests that writing and
erasing heads directly contribute to predictions, when in fact their effects
cancel out. Then we present adversarial prompts for which this effect is
particularly salient. These findings reveal that memory management can make DLA
results misleading. Accordingly, we make concrete recommendations for circuit
analysis to prevent interpretability illusions.
( 2
min )
We present an oracle-efficient relaxation for the adversarial contextual
bandits problem, where the contexts are sequentially drawn i.i.d from a known
distribution and the cost sequence is chosen by an online adversary. Our
algorithm has a regret bound of
$O(T^{\frac{2}{3}}(K\log(|\Pi|))^{\frac{1}{3}})$ and makes at most $O(K)$ calls
per round to an offline optimization oracle, where $K$ denotes the number of
actions, $T$ denotes the number of rounds and $\Pi$ denotes the set of
policies. This is the first result to improve the prior best bound of
$O((TK)^{\frac{2}{3}}(\log(|\Pi|))^{\frac{1}{3}})$ as obtained by Syrgkanis et
al. at NeurIPS 2016, and the first to match the original bound of Langford and
Zhang at NeurIPS 2007 which was obtained for the stochastic case.
( 2
min )
Bayesian bandit algorithms with approximate Bayesian inference have been
widely used in real-world applications. However, there is a large discrepancy
between the superior practical performance of these approaches and their
theoretical justification. Previous research only indicates a negative
theoretical result: Thompson sampling could have a worst-case linear regret
$\Omega(T)$ with a constant threshold on the inference error measured by one
$\alpha$-divergence. To bridge this gap, we propose an Enhanced Bayesian Upper
Confidence Bound (EBUCB) framework that can efficiently accommodate bandit
problems in the presence of approximate inference. Our theoretical analysis
demonstrates that for Bernoulli multi-armed bandits, EBUCB can achieve the
optimal regret order $O(\log T)$ if the inference error measured by two
different $\alpha$-divergences is less than a constant, regardless of how large
this constant is. To our best knowledge, our study provides the first
theoretical regret bound that is better than $o(T)$ in the setting of constant
approximate inference error. Furthermore, in concordance with the negative
results in previous studies, we show that only one bounded $\alpha$-divergence
is insufficient to guarantee a sub-linear regret.
( 3
min )
We study the problem of learning decentralized linear quadratic regulator
when the system model is unknown a priori. We propose an online learning
algorithm that adaptively designs a control policy as new data samples from a
single system trajectory become available. Our algorithm design uses a
disturbance-feedback representation of state-feedback controllers coupled with
online convex optimization with memory and delayed feedback. We show that our
controller enjoys an expected regret that scales as $\sqrt{T}$ with the time
horizon $T$ for the case of partially nested information pattern. For more
general information patterns, the optimal controller is unknown even if the
system model is known. In this case, the regret of our controller is shown with
respect to a linear sub-optimal controller. We validate our theoretical
findings using numerical experiments.
( 2
min )
This paper introduces a new approach to address the issue of class imbalance
in graph neural networks (GNNs) for learning on graph-structured data. Our
approach integrates imbalanced node classification and Bias-Variance
Decomposition, establishing a theoretical framework that closely relates data
imbalance to model variance. We also leverage graph augmentation technique to
estimate the variance, and design a regularization term to alleviate the impact
of imbalance. Exhaustive tests are conducted on multiple benchmarks, including
naturally imbalanced datasets and public-split class-imbalanced datasets,
demonstrating that our approach outperforms state-of-the-art methods in various
imbalanced scenarios. This work provides a novel theoretical perspective for
addressing the problem of imbalanced node classification in GNNs.
( 2
min )
The causalimages R package enables causal inference with image and image
sequence data, providing new tools for integrating novel data sources like
satellite and bio-medical imagery into the study of cause and effect. One set
of functions enables image-based causal inference analyses. For example, one
key function decomposes treatment effect heterogeneity by images using an
interpretable Bayesian framework. This allows for determining which types of
images or image sequences are most responsive to interventions. A second
modeling function allows researchers to control for confounding using images.
The package also allows investigators to produce embeddings that serve as
vector summaries of the image or video content. Finally, infrastructural
functions are also provided, such as tools for writing large-scale image and
image sequence data as sequentialized byte strings for more rapid image
analysis. causalimages therefore opens new capabilities for causal inference in
R, letting researchers use informative imagery in substantive analyses in a
fast and accessible manner.
( 2
min )
With the exponential growth in large language models (LLMs), leveraging their
emergent properties for specialized domains like finance merits exploration.
However, regulated fields such as finance pose unique constraints, requiring
domain-optimized frameworks. We present ConFIRM, an LLM-based conversational
financial information retrieval model tailored for query intent classification
and knowledge base labeling.
ConFIRM comprises two modules:
1) a method to synthesize finance domain-specific question-answer pairs, and
2) evaluation of parameter efficient fine-tuning approaches for the query
classification task. We generate a dataset of over 4000 samples, assessing
accuracy on a separate test set.
ConFIRM achieved over 90% accuracy, essential for regulatory compliance.
ConFIRM provides a data-efficient solution to extract precise query intent for
financial dialog systems.
( 2
min )
Since their inception, Variational Autoencoders (VAEs) have become central in
machine learning. Despite their widespread use, numerous questions regarding
their theoretical properties remain open. Using PAC-Bayesian theory, this work
develops statistical guarantees for VAEs. First, we derive the first
PAC-Bayesian bound for posterior distributions conditioned on individual
samples from the data-generating distribution. Then, we utilize this result to
develop generalization guarantees for the VAE's reconstruction loss, as well as
upper bounds on the distance between the input and the regenerated
distributions. More importantly, we provide upper bounds on the Wasserstein
distance between the input distribution and the distribution defined by the
VAE's generative model.
( 2
min )
Shapley values are among the most popular tools for explaining predictions of
blackbox machine learning models. However, their high computational cost
motivates the use of sampling approximations, inducing a considerable degree of
uncertainty. To stabilize these model explanations, we propose ControlSHAP, an
approach based on the Monte Carlo technique of control variates. Our
methodology is applicable to any machine learning model and requires virtually
no extra computation or modeling effort. On several high-dimensional datasets,
we find it can produce dramatic reductions in the Monte Carlo variability of
Shapley estimates.
( 2
min )
Accurately predicting the elastic properties of crystalline solids is vital
for computational materials science. However, traditional atomistic scale ab
initio approaches are computationally intensive, especially for studying
complex materials with a large number of atoms in a unit cell. We introduce a
novel data-driven approach to efficiently predict the elastic properties of
crystal structures using SE(3)-equivariant graph neural networks (GNNs). This
approach yields important scalar elastic moduli with the accuracy comparable to
recent data-driven studies. Importantly, our symmetry-aware GNNs model also
enables the prediction of the strain energy density (SED) and the associated
elastic constants, the fundamental tensorial quantities that are significantly
influenced by a material's crystallographic group. The model consistently
distinguishes independent elements of SED tensors, in accordance with the
symmetry of the crystal structures. Finally, our deep learning model possesses
meaningful latent features, offering an interpretable prediction of the elastic
properties.
( 2
min )
Multiscale is a hallmark feature of complex nonlinear systems. While the
simulation using the classical numerical methods is restricted by the local
\textit{Taylor} series constraints, the multiscale techniques are often limited
by finding heuristic closures. This study proposes a new method for simulating
multiscale problems using deep neural networks. By leveraging the hierarchical
learning of neural network time steppers, the method adapts time steps to
approximate dynamical system flow maps across timescales. This approach
achieves state-of-the-art performance in less computational time compared to
fixed-step neural network solvers. The proposed method is demonstrated on
several nonlinear dynamical systems, and source codes are provided for
implementation. This method has the potential to benefit multiscale analysis of
complex systems and encourage further investigation in this area.
( 2
min )
Knee-Joint Osteoarthritis (KOA) is a prevalent cause of global disability and
is inherently complex to diagnose due to its subtle radiographic markers and
individualized progression. One promising classification avenue involves
applying deep learning methods; however, these techniques demand extensive,
diversified datasets, which pose substantial challenges due to medical data
collection restrictions. Existing practices typically resort to smaller
datasets and transfer learning. However, this approach often inherits
unnecessary pre-learned features that can clutter the classifier's vector
space, potentially hampering performance. This study proposes a novel paradigm
for improving post-training specialized classifiers by introducing adaptive
variance thresholding (AVT) followed by Neural Architecture Search (NAS). This
approach led to two key outcomes: an increase in the initial accuracy of the
pre-trained KOA models and a 60-fold reduction in the NAS input vector space,
thus facilitating faster inference speed and a more efficient hyperparameter
search. We also applied this approach to an external model trained for KOA
classification. Despite its initial performance, the application of our
methodology improved its average accuracy, making it one of the top three KOA
classification models.
( 2
min )
Positional encodings are employed to capture the high frequency information
of the encoded signals in implicit neural representation (INR). In this paper,
we propose a novel positional encoding method which improves the reconstruction
quality of the INR. The proposed embedding method is more advantageous for the
compact data representation because it has a greater number of frequency basis
than the existing methods. Our experiments shows that the proposed method
achieves significant gain in the rate-distortion performance without
introducing any additional complexity in the compression task and higher
reconstruction quality in novel view synthesis.
( 2
min )
Knee Osteoarthritis (KOA), a leading cause of disability worldwide, is
challenging to detect early due to subtle radiographic indicators. Diverse,
extensive datasets are needed but are challenging to compile because of
privacy, data collection limitations, and the progressive nature of KOA.
However, a model capable of projecting genuine radiographs into different OA
stages could augment data pools, enhance algorithm training, and offer
pre-emptive prognostic insights. In this study, we trained a CycleGAN model to
synthesize past and future stages of KOA on any genuine radiograph. The model
was validated using a Convolutional Neural Network that was deceived into
misclassifying disease stages in transformed images, demonstrating the
CycleGAN's ability to effectively transform disease characteristics forward or
backward in time. The model was particularly effective in synthesizing future
disease states and showed an exceptional ability to retroactively transition
late-stage radiographs to earlier stages by eliminating osteophytes and
expanding knee joint space, signature characteristics of None or Doubtful KOA.
The model's results signify a promising potential for enhancing diagnostic
models, data augmentation, and educational and prognostic usage in healthcare.
Nevertheless, further refinement, validation, and a broader evaluation process
encompassing both CNN-based assessments and expert medical feedback are
emphasized for future research and development.
( 2
min )
This paper proposes an algorithm that implements binary encoding of the
categorical features of neural network model input data, while also
implementing changes in the forward and backpropagation procedures in order to
achieve the property of having model weight changes, that result from the
neural network learning process for certain data instances of some feature
category, only affect the forward pass calculations for input data instances of
that same feature category, as it is in the case of utilising one-hot encoding
for categorical features.
( 2
min )
This paper explores the critical role of differentiation approaches for
data-driven differential equation discovery. Accurate derivatives of the input
data are essential for reliable algorithmic operation, particularly in
real-world scenarios where measurement quality is inevitably compromised. We
propose alternatives to the commonly used finite differences-based method,
notorious for its instability in the presence of noise, which can exacerbate
random errors in the data. Our analysis covers four distinct methods:
Savitzky-Golay filtering, spectral differentiation, smoothing based on
artificial neural networks, and the regularization of derivative variation. We
evaluate these methods in terms of applicability to problems, similar to the
real ones, and their ability to ensure the convergence of equation discovery
algorithms, providing valuable insights for robust modeling of real-world
processes.
( 2
min )
In this study, we present an investigation into the anisotropy dynamics and
intrinsic dimension of embeddings in transformer architectures, focusing on the
dichotomy between encoders and decoders. Our findings reveal that the
anisotropy profile in transformer decoders exhibits a distinct bell-shaped
curve, with the highest anisotropy concentrations in the middle layers. This
pattern diverges from the more uniformly distributed anisotropy observed in
encoders. In addition, we found that the intrinsic dimension of embeddings
increases in the initial phases of training, indicating an expansion into
higher-dimensional space. Which is then followed by a compression phase towards
the end of training with dimensionality decrease, suggesting a refinement into
more compact representations. Our results provide fresh insights to the
understanding of encoders and decoders embedding properties.
( 2
min )
Recent research indicates that frequent model communication stands as a major
bottleneck to the efficiency of decentralized machine learning (ML),
particularly for large-scale and over-parameterized neural networks (NNs). In
this paper, we introduce MALCOM-PSGD, a new decentralized ML algorithm that
strategically integrates gradient compression techniques with model
sparsification. MALCOM-PSGD leverages proximal stochastic gradient descent to
handle the non-smoothness resulting from the $\ell_1$ regularization in model
sparsification. Furthermore, we adapt vector source coding and dithering-based
quantization for compressed gradient communication of sparsified models. Our
analysis shows that decentralized proximal stochastic gradient descent with
compressed communication has a convergence rate of
$\mathcal{O}\left(\ln(t)/\sqrt{t}\right)$ assuming a diminishing learning rate
and where $t$ denotes the number of iterations. Numerical results verify our
theoretical findings and demonstrate that our method reduces communication
costs by approximately $75\%$ when compared to the state-of-the-art method.
( 2
min )
Advanced materials are needed to further next-generation technologies such as
quantum computing, carbon capture, and low-cost medical imaging. However,
advanced materials discovery is confounded by two fundamental challenges: the
challenge of a high-dimensional, complex materials search space and the
challenge of combining knowledge, i.e., data fusion across instruments and
labs. To overcome the first challenge, researchers employ knowledge of the
underlying material synthesis-structure-property relationship, as a material's
structure is often predictive of its functional property and vice versa. For
example, optimal materials often occur along composition-phase boundaries or
within specific phase regions. Additionally, knowledge of the
synthesis-structure-property relationship is fundamental to understanding
underlying physical mechanisms. However, quantifying the
synthesis-structure-property relationship requires overcoming the second
challenge. Researchers must merge knowledge gathered across instruments,
measurement modalities, and even laboratories. We present the
Synthesis-structure-property relAtionship coreGionalized lEarner (SAGE)
algorithm. A fully Bayesian algorithm that uses multimodal coregionalization to
merge knowledge across data sources to learn synthesis-structure-property
relationships.
( 2
min )
Deep neural networks have achieved significant success in the last decades,
but they are not well-calibrated and often produce unreliable predictions. A
large number of literature relies on uncertainty quantification to evaluate the
reliability of a learning model, which is particularly important for
applications of out-of-distribution (OOD) detection and misclassification
detection. We are interested in uncertainty quantification for interdependent
node-level classification. We start our analysis based on graph posterior
networks (GPNs) that optimize the uncertainty cross-entropy (UCE)-based loss
function. We describe the theoretical limitations of the widely-used UCE loss.
To alleviate the identified drawbacks, we propose a distance-based
regularization that encourages clustered OOD nodes to remain clustered in the
latent space. We conduct extensive comparison experiments on eight standard
datasets and demonstrate that the proposed regularization outperforms the
state-of-the-art in both OOD detection and misclassification detection.
( 2
min )
With the advances in computationally efficient artificial Intelligence (AI)
techniques and their numerous applications in our everyday life, there is a
pressing need to understand the computational details hidden in black box AI
techniques such as most popular machine learning and deep learning techniques;
through more detailed explanations. The origin of explainable AI (xAI) is
coined from these challenges and recently gained more attention by the
researchers by adding explainability comprehensively in traditional AI systems.
This leads to develop an appropriate framework for successful applications of
xAI in real life scenarios with respect to innovations, risk mitigation,
ethical issues and logical values to the users. In this book chapter, an
in-depth analysis of several xAI frameworks and methods including LIME (Local
Interpretable Model-agnostic Explanations) and SHAP (SHapley Additive
exPlanations) are provided. Random Forest Classifier as black box AI is used on
a publicly available Diabetes symptoms dataset with LIME and SHAP for better
interpretations. The results obtained are interesting in terms of transparency,
valid and trustworthiness in diabetes disease prediction.
( 2
min )
We show that a constant-size constant-error coreset for polytope distance is
simple to maintain under merges of coresets. However, increasing the size
cannot improve the error bound significantly beyond that constant.
( 2
min )
This report explores the theory that explains the high sparsity phenomenon
\citep{tosato2023emergent} observed in the forward-forward algorithm
\citep{hinton2022forward}. The two theorems proposed predict the sparsity
changes of a single data point's activation in two cases: Theorem
\ref{theorem:1}: Decrease the goodness of the whole batch. Theorem
\ref{theorem:2}: Apply the complete forward forward algorithm to decrease the
goodness for negative data and increase the goodness for positive data. The
theory aligns well with the experiments tested on the MNIST dataset.
( 2
min )
We present an oracle-efficient relaxation for the adversarial contextual
bandits problem, where the contexts are sequentially drawn i.i.d from a known
distribution and the cost sequence is chosen by an online adversary. Our
algorithm has a regret bound of
$O(T^{\frac{2}{3}}(K\log(|\Pi|))^{\frac{1}{3}})$ and makes at most $O(K)$ calls
per round to an offline optimization oracle, where $K$ denotes the number of
actions, $T$ denotes the number of rounds and $\Pi$ denotes the set of
policies. This is the first result to improve the prior best bound of
$O((TK)^{\frac{2}{3}}(\log(|\Pi|))^{\frac{1}{3}})$ as obtained by Syrgkanis et
al. at NeurIPS 2016, and the first to match the original bound of Langford and
Zhang at NeurIPS 2007 which was obtained for the stochastic case.
( 2
min )
Bayesian bandit algorithms with approximate Bayesian inference have been
widely used in real-world applications. However, there is a large discrepancy
between the superior practical performance of these approaches and their
theoretical justification. Previous research only indicates a negative
theoretical result: Thompson sampling could have a worst-case linear regret
$\Omega(T)$ with a constant threshold on the inference error measured by one
$\alpha$-divergence. To bridge this gap, we propose an Enhanced Bayesian Upper
Confidence Bound (EBUCB) framework that can efficiently accommodate bandit
problems in the presence of approximate inference. Our theoretical analysis
demonstrates that for Bernoulli multi-armed bandits, EBUCB can achieve the
optimal regret order $O(\log T)$ if the inference error measured by two
different $\alpha$-divergences is less than a constant, regardless of how large
this constant is. To our best knowledge, our study provides the first
theoretical regret bound that is better than $o(T)$ in the setting of constant
approximate inference error. Furthermore, in concordance with the negative
results in previous studies, we show that only one bounded $\alpha$-divergence
is insufficient to guarantee a sub-linear regret.
( 3
min )
Real-time density estimation is ubiquitous in many applications, including
computer vision and signal processing. Kernel density estimation is arguably
one of the most commonly used density estimation techniques, and the use of
"sliding window" mechanism adapts kernel density estimators to dynamic
processes. In this paper, we derive the asymptotic mean integrated squared
error (AMISE) upper bound for the "sliding window" kernel density estimator.
This upper bound provides a principled guide to devise a novel estimator, which
we name the temporal adaptive kernel density estimator (TAKDE). Compared to
heuristic approaches for "sliding window" kernel density estimator, TAKDE is
theoretically optimal in terms of the worst-case AMISE. We provide numerical
experiments using synthetic and real-world datasets, showing that TAKDE
outperforms other state-of-the-art dynamic density estimators (including those
outside of kernel family). In particular, TAKDE achieves a superior test
log-likelihood with a smaller runtime.
( 2
min )
The consistency of the maximum likelihood estimator for mixtures of
elliptically-symmetric distributions for estimating its population version is
shown, where the underlying distribution $P$ is nonparametric and does not
necessarily belong to the class of mixtures on which the estimator is based. In
a situation where $P$ is a mixture of well enough separated but nonparametric
distributions it is shown that the components of the population version of the
estimator correspond to the well separated components of $P$. This provides
some theoretical justification for the use of such estimators for cluster
analysis in case that $P$ has well separated subpopulations even if these
subpopulations differ from what the mixture model assumes.
( 2
min )
Recent advances have shown that GP priors, or their finite realisations, can
be encoded using deep generative models such as variational autoencoders
(VAEs). These learned generators can serve as drop-in replacements for the
original priors during MCMC inference. While this approach enables efficient
inference, it loses information about the hyperparameters of the original
models, and consequently makes inference over hyperparameters impossible and
the learned priors indistinct. To overcome this limitation, we condition the
VAE on stochastic process hyperparameters. This allows the joint encoding of
hyperparameters with GP realizations and their subsequent estimation during
inference. Further, we demonstrate that our proposed method, PriorCVAE, is
agnostic to the nature of the models which it approximates, and can be used,
for instance, to encode solutions of ODEs. It provides a practical tool for
approximate inference and shows potential in real-life spatial and
spatiotemporal applications.
( 2
min )
Shapley values are among the most popular tools for explaining predictions of
blackbox machine learning models. However, their high computational cost
motivates the use of sampling approximations, inducing a considerable degree of
uncertainty. To stabilize these model explanations, we propose ControlSHAP, an
approach based on the Monte Carlo technique of control variates. Our
methodology is applicable to any machine learning model and requires virtually
no extra computation or modeling effort. On several high-dimensional datasets,
we find it can produce dramatic reductions in the Monte Carlo variability of
Shapley estimates.
( 2
min )
We investigate how shallow ReLU networks interpolate between known regions.
Our analysis shows that empirical risk minimizers converge to a minimum norm
interpolant as the number of data points and parameters tends to infinity when
a weight decay regularizer is penalized with a coefficient which vanishes at a
precise rate as the network width and the number of data points grow. With and
without explicit regularization, we numerically study the implicit bias of
common optimization algorithms towards known minimum norm interpolants.
( 2
min )
Since their inception, Variational Autoencoders (VAEs) have become central in
machine learning. Despite their widespread use, numerous questions regarding
their theoretical properties remain open. Using PAC-Bayesian theory, this work
develops statistical guarantees for VAEs. First, we derive the first
PAC-Bayesian bound for posterior distributions conditioned on individual
samples from the data-generating distribution. Then, we utilize this result to
develop generalization guarantees for the VAE's reconstruction loss, as well as
upper bounds on the distance between the input and the regenerated
distributions. More importantly, we provide upper bounds on the Wasserstein
distance between the input distribution and the distribution defined by the
VAE's generative model.
( 2
min )
We present the new Orthogonal Polynomials Approximation Algorithm (OPAA), a
parallelizable algorithm that solves two problems from a functional analytic
approach: first, it finds a smooth functional estimate of a density function,
whether it is normalized or not; second, the algorithm provides an estimate of
the normalizing weight. In the context of Bayesian inference, OPAA provides an
estimate of the posterior function as well as the normalizing weight, which is
also known as the evidence.
A core component of OPAA is a special transform of the square root of the
joint distribution into a special functional space of our construct. Through
this transform, the evidence is equated with the $L^2$ norm of the transformed
function, squared. Hence, the evidence can be estimated by the sum of squares
of the transform coefficients. The computations can be parallelized and
completed in one pass.
To compute the transform coefficients, OPAA proposes a new computational
scheme leveraging Gauss--Hermite quadrature in higher dimensions. Not only does
it avoid the potential high variance problem associated with random sampling
methods, it also enables one to speed up the computation by parallelization,
and significantly reduces the complexity by a vector decomposition.
( 2
min )
A classic inferential statistical problem is the goodness-of-fit (GOF) test.
Such a test can be challenging when the hypothesized parametric model has an
intractable likelihood and its distributional form is not available. Bayesian
methods for GOF can be appealing due to their ability to incorporate expert
knowledge through prior distributions.
However, standard Bayesian methods for this test often require strong
distributional assumptions on the data and their relevant parameters. To
address this issue, we propose a semi-Bayesian nonparametric (semi-BNP)
procedure in the context of the maximum mean discrepancy (MMD) measure that can
be applied to the GOF test. Our method introduces a novel Bayesian estimator
for the MMD, enabling the development of a measure-based hypothesis test for
intractable models. Through extensive experiments, we demonstrate that our
proposed test outperforms frequentist MMD-based methods by achieving a lower
false rejection and acceptance rate of the null hypothesis. Furthermore, we
showcase the versatility of our approach by embedding the proposed estimator
within a generative adversarial network (GAN) framework. It facilitates a
robust BNP learning approach as another significant application of our method.
With our BNP procedure, this new GAN approach can enhance sample diversity and
improve inferential accuracy compared to traditional techniques.
( 3
min )
Early-exit neural networks (EENNs) facilitate adaptive inference by producing
predictions at multiple stages of the forward pass. In safety-critical
applications, these predictions are only meaningful when complemented with
reliable uncertainty estimates. Yet, due to their sequential structure, an
EENN's uncertainty estimates should also be consistent: labels that are deemed
improbable at one exit should not reappear within the confidence interval / set
of later exits. We show that standard uncertainty quantification techniques,
like Bayesian methods or conformal prediction, can lead to inconsistency across
exits. We address this problem by applying anytime-valid confidence sequences
(AVCSs) to the exits of EENNs. By design, AVCSs maintain consistency across
exits. We examine the theoretical and practical challenges of applying AVCSs to
EENNs and empirically validate our approach on both regression and
classification tasks.
( 2
min )
Deep neural networks have achieved significant success in the last decades,
but they are not well-calibrated and often produce unreliable predictions. A
large number of literature relies on uncertainty quantification to evaluate the
reliability of a learning model, which is particularly important for
applications of out-of-distribution (OOD) detection and misclassification
detection. We are interested in uncertainty quantification for interdependent
node-level classification. We start our analysis based on graph posterior
networks (GPNs) that optimize the uncertainty cross-entropy (UCE)-based loss
function. We describe the theoretical limitations of the widely-used UCE loss.
To alleviate the identified drawbacks, we propose a distance-based
regularization that encourages clustered OOD nodes to remain clustered in the
latent space. We conduct extensive comparison experiments on eight standard
datasets and demonstrate that the proposed regularization outperforms the
state-of-the-art in both OOD detection and misclassification detection.
( 2
min )
NVIDIA today unveiled at SC23 the next wave of technologies that will lift scientific and industrial research centers worldwide to new levels of performance and energy efficiency. “NVIDIA hardware and software innovations are creating a new class of AI supercomputers,” said Ian Buck, vice president of the company’s high performance computing and hyperscale data center Read article >
( 9
min )
A widely acclaimed large language model for genomic data has demonstrated its ability to generate gene sequences that closely resemble real-world variants of SARS-CoV-2, the virus behind COVID-19. Called GenSLMs, the model, which last year won the Gordon Bell special prize for high performance computing-based COVID-19 research, was trained on a dataset of nucleotide sequences Read article >
( 6
min )
Michael Kuehn and Davide Vodola are taking to new heights work that’s pioneering quantum computing for the world’s largest chemical company. The BASF researchers are demonstrating how a quantum algorithm can see what no traditional simulation can — key attributes of NTA, a compound with applications that include removing toxic metals like iron from a Read article >
( 6
min )
Dozens of new supercomputers for scientific computing will soon hop online, powered by NVIDIA’s breakthrough GH200 Grace Hopper Superchip for giant-scale AI and high performance computing. The NVIDIA GH200 enables scientists and researchers to tackle the world’s most challenging problems by accelerating complex AI and HPC applications running terabytes of data. At the SC23 supercomputing Read article >
( 6
min )
At a basic level, Machine Learning (ML) technology learns from data to make predictions. Businesses use their data with an ML-powered personalization service to elevate their customer experience. This approach allows businesses to use data to derive actionable insights and help grow their revenue and brand loyalty. Amazon Personalize accelerates your digital transformation with ML, […]
( 8
min )
One of the most common applications of generative AI and large language models (LLMs) is answering questions based on a specific external knowledge corpus. Retrieval-Augmented Generation (RAG) is a popular technique for building question answering systems that use an external knowledge base. To learn more, refer to Build a powerful question answering bot with Amazon […]
( 7
min )
AI Weirdness: the strange side of machine learning
( 2
min )
In recent years, significant progress in generative AI has highlighted the
important role of physics-inspired models that utilize advanced mathematical
concepts based on fundamental physics principles to enhance artificial
intelligence capabilities. Among these models, those based on diffusion
equations have greatly improved image quality. This study aims to explore the
potential uses of Maxwell-Boltzmann equation, which forms the basis of the
kinetic theory of gases, and the Michaelis-Menten model in Marketing Mix
Modelling (MMM) applications. We propose incorporating these equations into
Hierarchical Bayesian models to analyse consumer behaviour in the context of
advertising. These equation sets excel in accurately describing the random
dynamics in complex systems like social interactions and consumer-advertising
interactions.
( 2
min )
Adjoint operators have been found to be effective in the exploration of CNN's
inner workings [1]. However, the previous no-bias assumption restricted its
generalization. We overcome the restriction via embedding input images into an
extended normed space that includes bias in all CNN layers as part of the
extended space and propose an adjoint-operator-based algorithm that maps
high-level weights back to the extended input space for reconstructing an
effective hypersurface. Such hypersurface can be computed for an arbitrary unit
in the CNN, and we prove that this reconstructed hypersurface, when multiplied
by the original input (through an inner product), will precisely replicate the
output value of each unit. We show experimental results based on the CIFAR-10
and CIFAR-100 data sets where the proposed approach achieves near 0 activation
value reconstruction error.
( 2
min )
We consider the straggler problem in decentralized learning over a logical
ring while preserving user data privacy. Especially, we extend the recently
proposed framework of differential privacy (DP) amplification by
decentralization by Cyffers and Bellet to include overall training
latency--comprising both computation and communication latency. Analytical
results on both the convergence speed and the DP level are derived for both a
skipping scheme (which ignores the stragglers after a timeout) and a baseline
scheme that waits for each node to finish before the training continues. A
trade-off between overall training latency, accuracy, and privacy,
parameterized by the timeout of the skipping scheme, is identified and
empirically validated for logistic regression on a real-world dataset and for
image classification using the MNIST and CIFAR-10 datasets.
( 2
min )
This paper integrates manifold learning techniques within a \emph{Gaussian
process upper confidence bound} algorithm to optimize an objective function on
a manifold. Our approach is motivated by applications where a full
representation of the manifold is not available and querying the objective is
expensive. We rely on a point cloud of manifold samples to define a graph
Gaussian process surrogate model for the objective. Query points are
sequentially chosen using the posterior distribution of the surrogate model
given all previous queries. We establish regret bounds in terms of the number
of queries and the size of the point cloud. Several numerical examples
complement the theory and illustrate the performance of our method.
( 2
min )
For a widely-studied data model and general loss and sample-hardening
functions we prove that the Supervised Contrastive Learning (SCL), Hard-SCL
(HSCL), and Unsupervised Contrastive Learning (UCL) risks are minimized by
representations that exhibit Neural Collapse (NC), i.e., the class means form
an Equianglular Tight Frame (ETF) and data from the same class are mapped to
the same representation. We also prove that for any representation mapping, the
HSCL and Hard-UCL (HUCL) risks are lower bounded by the corresponding SCL and
UCL risks. Although the optimality of ETF is known for SCL, albeit only for
InfoNCE loss, its optimality for HSCL and UCL under general loss and hardening
functions is novel. Moreover, our proofs are much simpler, compact, and
transparent. We empirically demonstrate, for the first time, that ADAM
optimization of HSCL and HUCL risks with random initialization and suitable
hardness levels can indeed converge to the NC geometry if we incorporate
unit-ball or unit-sphere feature normalization. Without incorporating hard
negatives or feature normalization, however, the representations learned via
ADAM suffer from dimensional collapse (DC) and fail to attain the NC geometry.
( 2
min )
Federated Learning is expected to provide strong privacy guarantees, as only
gradients or model parameters but no plain text training data is ever exchanged
either between the clients or between the clients and the central server. In
this paper, we challenge this claim by introducing a simple but still very
effective membership inference attack algorithm, which relies only on a single
training step. In contrast to the popular honest-but-curious model, we
investigate a framework with a dishonest central server. Our strategy is
applicable to models with ReLU activations and uses the properties of this
activation function to achieve perfect accuracy. Empirical evaluation on visual
classification tasks with MNIST, CIFAR10, CIFAR100 and CelebA datasets show
that our method provides perfect accuracy in identifying one sample in a
training set with thousands of samples. Occasional failures of our method lead
us to discover duplicate images in the CIFAR100 and CelebA datasets.
( 2
min )
In data-driven systems, data exploration is imperative for making real-time
decisions. However, big data is stored in massive databases that are difficult
to retrieve. Approximate Query Processing (AQP) is a technique for providing
approximate answers to aggregate queries based on a summary of the data
(synopsis) that closely replicates the behavior of the actual data, which can
be useful where an approximate answer to the queries would be acceptable in a
fraction of the real execution time. This study explores the novel utilization
of Generative Adversarial Networks (GANs) in the generation of tabular data
that can be employed in AQP for synopsis construction. We thoroughly
investigate the unique challenges posed by the synopsis construction process,
including maintaining data distribution characteristics, handling bounded
continuous and categorical data, and preserving semantic relationships and then
introduce the advancement of tabular GAN architectures that overcome these
challenges. Furthermore, we propose and validate a suite of statistical metrics
tailored for assessing the reliability of the GAN-generated synopses. Our
findings demonstrate that advanced GAN variations exhibit a promising capacity
to generate high-fidelity synopses, potentially transforming the efficiency and
effectiveness of AQP in data-driven systems.
( 2
min )
Self-supervised learning (SSL) for WiFi-based human activity recognition
(HAR) holds great promise due to its ability to address the challenge of
insufficient labeled data. However, directly transplanting SSL algorithms,
especially contrastive learning, originally designed for other domains to CSI
data, often fails to achieve the expected performance. We attribute this issue
to the inappropriate alignment criteria, which disrupt the semantic distance
consistency between the feature space and the input space. To address this
challenge, we introduce \textbf{A}ntenna \textbf{R}esponse \textbf{C}onsistency
(ARC) as a solution to define proper alignment criteria. ARC is designed to
retain semantic information from the input space while introducing robustness
to real-world noise. Moreover, we substantiate the effectiveness of ARC through
a comprehensive set of experiments, demonstrating its capability to enhance the
performance of self-supervised learning for WiFi-based HAR by achieving an
increase of over 5\% in accuracy in most cases and achieving a best accuracy of
94.97\%.
( 2
min )
With the development of trustworthy Federated Learning (FL), the requirement
of implementing right to be forgotten gives rise to the area of Federated
Unlearning (FU). Comparing to machine unlearning, a major challenge of FU lies
in the decentralized and privacy-preserving nature of FL, in which clients
jointly train a global model without sharing their raw data, making it
substantially more intricate to selectively unlearn specific information. In
that regard, many efforts have been made to tackle the challenges of FU and
have achieved significant progress. In this paper, we present a comprehensive
survey of FU. Specially, we provide the existing algorithms, objectives,
evaluation metrics, and identify some challenges of FU. By reviewing and
comparing some studies, we summarize them into a taxonomy for various schemes,
potential applications and future directions.
( 2
min )
Open-set recognition (OSR), the identification of novel categories, can be a
critical component when deploying classification models in real-world
applications. Recent work has shown that familiarity-based scoring rules such
as the Maximum Softmax Probability (MSP) or the Maximum Logit Score (MLS) are
strong baselines when the closed-set accuracy is high. However, one of the
potential weaknesses of familiarity-based OSR are adversarial attacks. Here, we
present gradient-based adversarial attacks on familiarity scores for both types
of attacks, False Familiarity and False Novelty attacks, and evaluate their
effectiveness in informed and uninformed settings on TinyImageNet.
( 2
min )
We prove an upper bound on the covering number of real algebraic varieties,
images of polynomial maps and semialgebraic sets. The bound remarkably improves
the best known bound by Yomdin-Comte, and its proof is much more
straightforward. As a consequence, our result gives a bound on volume of the
tubular neighborhood of a real variety, improving the results by Lotz and
Basu-Lerario. We apply our theory to three main application domains. Firstly,
we derive a near-optimal bound on the covering number of low rank CP tensors.
Secondly, we prove a bound on the sketching dimension for (general) polynomial
optimization problems. Lastly, we deduce generalization error bounds for deep
neural networks with rational or ReLU activations, improving or matching the
best known results in the literature.
( 2
min )
In this work, we present Transformer-based Powered Descent Guidance (T-PDG),
a scalable algorithm for reducing the computational complexity of the direct
optimization formulation of the spacecraft powered descent guidance problem.
T-PDG uses data from prior runs of trajectory optimization algorithms to train
a transformer neural network, which accurately predicts the relationship
between problem parameters and the globally optimal solution for the powered
descent guidance problem. The solution is encoded as the set of tight
constraints corresponding to the constrained minimum-cost trajectory and the
optimal final time of landing. By leveraging the attention mechanism of
transformer neural networks, large sequences of time series data can be
accurately predicted when given only the spacecraft state and landing site
parameters. When applied to the real problem of Mars powered descent guidance,
T-PDG reduces the time for computing the 3 degree of freedom fuel-optimal
trajectory, when compared to lossless convexification, from an order of 1-8
seconds to less than 500 milliseconds. A safe and optimal solution is
guaranteed by including a feasibility check in T-PDG before returning the final
trajectory.
( 2
min )
In this paper, we address the limitations of the common data annotation and
training methods for objective single-label classification tasks. Typically,
when annotating such tasks annotators are only asked to provide a single label
for each sample and annotator disagreement is discarded when a final hard label
is decided through majority voting. We challenge this traditional approach,
acknowledging that determining the appropriate label can be difficult due to
the ambiguity and lack of context in the data samples. Rather than discarding
the information from such ambiguous annotations, our soft label method makes
use of them for training. Our findings indicate that additional annotator
information, such as confidence, secondary label and disagreement, can be used
to effectively generate soft labels. Training classifiers with these soft
labels then leads to improved performance and calibration on the hard label
test set.
( 2
min )
The growing use of digital communication platforms has given rise to various
criminal activities, such as grooming and drug dealing, which pose significant
challenges to law enforcement and forensic experts. This paper presents a
supervised keyphrase extraction approach to detect relevant information in
high-volume chat logs involving grooming and drug dealing for forensic
analysis. The proposed method, JointKPE++, builds upon the JointKPE keyphrase
extractor by employing improvements to handle longer texts effectively. We
evaluate JointKPE++ using BERT-based pre-trained models on grooming and drug
dealing datasets, including BERT, RoBERTa, SpanBERT, and BERTimbau. The results
show significant improvements over traditional approaches and demonstrate the
potential for JointKPE++ to aid forensic experts in efficiently detecting
keyphrases related to criminal activities.
( 2
min )
We consider an unknown multivariate function representing a system-such as a
complex numerical simulator-taking both deterministic and uncertain inputs. Our
objective is to estimate the set of deterministic inputs leading to outputs
whose probability (with respect to the distribution of the uncertain inputs) of
belonging to a given set is less than a given threshold. This problem, which we
call Quantile Set Inversion (QSI), occurs for instance in the context of robust
(reliability-based) optimization problems, when looking for the set of
solutions that satisfy the constraints with sufficiently large probability. To
solve the QSI problem, we propose a Bayesian strategy based on Gaussian process
modeling and the Stepwise Uncertainty Reduction (SUR) principle, to
sequentially choose the points at which the function should be evaluated to
efficiently approximate the set of interest. We illustrate the performance and
interest of the proposed SUR strategy through several numerical experiments.
( 2
min )
Generalized self-concordance is a key property present in the objective
function of many important learning problems. We establish the convergence rate
of a simple Frank-Wolfe variant that uses the open-loop step size strategy
$\gamma_t = 2/(t+2)$, obtaining a $\mathcal{O}(1/t)$ convergence rate for this
class of functions in terms of primal gap and Frank-Wolfe gap, where $t$ is the
iteration count. This avoids the use of second-order information or the need to
estimate local smoothness parameters of previous work. We also show improved
convergence rates for various common cases, e.g., when the feasible region
under consideration is uniformly convex or polyhedral.
( 2
min )
This paper integrates manifold learning techniques within a \emph{Gaussian
process upper confidence bound} algorithm to optimize an objective function on
a manifold. Our approach is motivated by applications where a full
representation of the manifold is not available and querying the objective is
expensive. We rely on a point cloud of manifold samples to define a graph
Gaussian process surrogate model for the objective. Query points are
sequentially chosen using the posterior distribution of the surrogate model
given all previous queries. We establish regret bounds in terms of the number
of queries and the size of the point cloud. Several numerical examples
complement the theory and illustrate the performance of our method.
( 2
min )
The aim of this study is to define importance of predictors for black box
machine learning methods, where the prediction function can be complex and
cannot be represented by statistical parameters. In this paper we defined a
``Generalized Variable Importance Metric (GVIM)'' using the true conditional
expectation function for a continuous or a binary response variable. We further
showed that the defined GVIM can be represented as a function of the
Conditional Average Treatment Effect (CATE) for multinomial and continuous
predictors. Then we propose how the metric can be estimated using using any
machine learning models. Finally using simulations we evaluated the properties
of the estimator when estimated from XGBoost, Random Forest and a mis-specified
generalized additive model.
( 2
min )
When systems use data-based models that are based on machine learning (ML),
errors in their results cannot be ruled out. This is particularly critical if
it remains unclear to the user how these models arrived at their decisions and
if errors can have safety-relevant consequences, as is often the case in the
medical field. In such cases, the use of dependable methods to quantify the
uncertainty remaining in a result allows the user to make an informed decision
about further usage and draw possible conclusions based on a given result. This
paper demonstrates the applicability and practical utility of the Uncertainty
Wrapper using flow cytometry as an application from the medical field that can
benefit from the use of ML models in conjunction with dependable and
transparent uncertainty quantification.
( 2
min )
In the recent past, using machine learning (ML) to make predictions, especially for data in the form of text and images, required extensive ML knowledge for creating and tuning of deep learning models. Today, ML has become more accessible to any user who wants to use ML models to generate business value. With Amazon SageMaker […]
( 7
min )
Creating high-performance machine learning (ML) solutions relies on exploring and optimizing training parameters, also known as hyperparameters. Hyperparameters are the knobs and levers that we use to adjust the training process, such as learning rate, batch size, regularization strength, and others, depending on the specific model and task at hand. Exploring hyperparameters involves systematically varying […]
( 20
min )
Thanks to a viral trend sweeping social media, we now know some men think about the Roman Empire every day. And thanks to Luke Farritor, a 21-year-old computer science undergrad at the University of Nebraska-Lincoln, and like-minded AI enthusiasts, there might soon be a lot more to think about. Blending a passion for history with Read article >
( 6
min )
Thanks to a viral trend sweeping social media, we now know some men think about the Roman Empire every day. And thanks to Luke Farritor, a 21-year-old computer science undergrad at the University of Nebraska-Lincoln, and like-minded AI enthusiasts, there might soon be a lot more to think about. Blending a passion for history with Read article >
( 6
min )
Fetal brain MRI is becoming an increasingly relevant complement to
neurosonography for perinatal diagnosis, allowing fundamental insights into
fetal brain development throughout gestation. However, uncontrolled fetal
motion and heterogeneity in acquisition protocols lead to data of variable
quality, potentially biasing the outcome of subsequent studies. We present
FetMRQC, an open-source machine-learning framework for automated image quality
assessment and quality control that is robust to domain shifts induced by the
heterogeneity of clinical data. FetMRQC extracts an ensemble of quality metrics
from unprocessed anatomical MRI and combines them to predict experts' ratings
using random forests. We validate our framework on a pioneeringly large and
diverse dataset of more than 1600 manually rated fetal brain T2-weighted images
from four clinical centers and 13 different scanners. Our study shows that
FetMRQC's predictions generalize well to unseen data while being interpretable.
FetMRQC is a step towards more robust fetal brain neuroimaging, which has the
potential to shed new insights on the developing human brain.
( 3
min )
Deep learning has taken by storm all fields involved in data analysis,
including remote sensing for Earth observation. However, despite significant
advances in terms of performance, its lack of explainability and
interpretability, inherent to neural networks in general since their inception,
remains a major source of criticism. Hence it comes as no surprise that the
expansion of deep learning methods in remote sensing is being accompanied by
increasingly intensive efforts oriented towards addressing this drawback
through the exploration of a wide spectrum of Explainable Artificial
Intelligence techniques. This chapter, organized according to prominent Earth
observation application fields, presents a panorama of the state-of-the-art in
explainable remote sensing image analysis.
( 2
min )
We introduce the text-to-instrument task, which aims at generating
sample-based musical instruments based on textual prompts. Accordingly, we
propose InstrumentGen, a model that extends a text-prompted generative audio
framework to condition on instrument family, source type, pitch (across an
88-key spectrum), velocity, and a joint text/audio embedding. Furthermore, we
present a differentiable loss function to evaluate the intra-instrument timbral
consistency of sample-based instruments. Our results establish a foundational
text-to-instrument baseline, extending research in the domain of automatic
sample-based instrument generation.
( 2
min )
Recent AI research has significantly reduced the barriers to apply AI, but
the process of setting up the necessary tools and frameworks can still be a
challenge. While AI-as-a-Service platforms have emerged to simplify the
training and deployment of AI models, they still fall short of achieving true
democratization of AI. In this paper, we aim to address this gap by comparing
several popular AI-as-a-Service platforms and identifying the key requirements
for a platform that can achieve true democratization of AI. Our analysis
highlights the need for self-hosting options, high scalability, and openness.
To address these requirements, we propose our approach: the "Open Space for
Machine Learning" platform. Our platform is built on cutting-edge technologies
such as Kubernetes, Kubeflow Pipelines, and Ludwig, enabling us to overcome the
challenges of democratizing AI. We argue that our approach is more
comprehensive and effective in meeting the requirements of democratizing AI
than existing AI-as-a-Service platforms.
( 2
min )
The electrocardiogram (ECG) is a dependable instrument for assessing the
function of the cardiovascular system. There has recently been much emphasis on
precisely classifying ECGs. While ECG situations have numerous similarities,
little attention has been paid to categorizing ECGs using graph neural
networks. In this study, we offer three distinct techniques for classifying
heartbeats using deep graph neural networks to classify the ECG signals
accurately. We suggest using different methods to extract topological features
from the ECG signal and then using a branch of the graph neural network named
graph isomorphism network for classifying the ECGs. On the PTB Diagnostics data
set, we tested the three proposed techniques. According to the findings, the
three proposed techniques are capable of making arrhythmia classification
predictions with the accuracy of 99.38, 98.76, and 91.93 percent, respectively.
( 2
min )
This study presents an innovative method for predicting the market value of
professional soccer players using explainable machine learning models. Using a
dataset curated from the FIFA website, we employ an ensemble machine learning
approach coupled with Shapley Additive exPlanations (SHAP) to provide detailed
explanations of the models' predictions. The GBDT model achieves the highest
mean R-Squared (0.8780) and the lowest mean Root Mean Squared Error
(3,221,632.175), indicating its superior performance among the evaluated
models. Our analysis reveals that specific skills such as ball control, short
passing, finishing, interceptions, dribbling, and tackling are paramount within
the skill dimension, whereas sprint speed and acceleration are critical in the
fitness dimension, and reactions are preeminent in the cognitive dimension. Our
results offer a more accurate, objective, and consistent framework for market
value estimation, presenting useful insights for managerial decisions in player
transfers.
( 2
min )
Black-box variational inference performance is sometimes hindered by the use
of gradient estimators with high variance. This variance comes from two sources
of randomness: Data subsampling and Monte Carlo sampling. While existing
control variates only address Monte Carlo noise, and incremental gradient
methods typically only address data subsampling, we propose a new "joint"
control variate that jointly reduces variance from both sources of noise. This
significantly reduces gradient variance, leading to faster optimization in
several applications.
( 2
min )
Contrastive learning has recently emerged as a promising approach for
learning data representations that discover and disentangle the explanatory
factors of the data. Previous analyses of such approaches have largely focused
on individual contrastive losses, such as noise-contrastive estimation (NCE)
and InfoNCE, and rely on specific assumptions about the data generating
process. This paper extends the theoretical guarantees for disentanglement to a
broader family of contrastive methods, while also relaxing the assumptions
about the data distribution. Specifically, we prove identifiability of the true
latents for four contrastive losses studied in this paper, without imposing
common independence assumptions. The theoretical findings are validated on
several benchmark datasets. Finally, practical limitations of these methods are
also investigated.
( 2
min )
In this paper, we develop data-dependent and algorithm-dependent
generalization bounds for transductive learning algorithms in the context of
information theory for the first time. We show that the generalization gap of
transductive learning algorithms can be bounded by the mutual information
between training labels and hypothesis. By innovatively proposing the concept
of transductive supersamples, we go beyond the inductive learning setting and
establish upper bounds in terms of various information measures. Furthermore,
we derive novel PAC-Bayesian bounds and build the connection between
generalization and loss landscape flatness under the transductive learning
setting. Finally, we present the upper bounds for adaptive optimization
algorithms and demonstrate the applications of results on semi-supervised
learning and graph learning scenarios. Our theoretic results are validated on
both synthetic and real-world datasets.
( 2
min )
The rising popularity of artificial intelligence in healthcare is
highlighting the problem that a computational model achieving super-human
clinical performance at its training sites may perform substantially worse at
new sites. In this perspective, we present common sources for this failure to
transport, which we divide into sources under the control of the experimenter
and sources inherent to the clinical data-generating process. Of the inherent
sources we look a little deeper into site-specific clinical practices that can
affect the data distribution, and propose a potential solution intended to
isolate the imprint of those practices on the data from the patterns of disease
cause and effect that are the usual target of clinical models.
( 2
min )
In this paper, we present new high-probability PAC-Bayes bounds for different
types of losses. Firstly, for losses with a bounded range, we recover a
strengthened version of Catoni's bound that holds uniformly for all parameter
values. This leads to new fast rate and mixed rate bounds that are
interpretable and tighter than previous bounds in the literature. In
particular, the fast rate bound is equivalent to the Seeger--Langford bound.
Secondly, for losses with more general tail behaviors, we introduce two new
parameter-free bounds: a PAC-Bayes Chernoff analogue when the loss' cumulative
generating function is bounded, and a bound when the loss' second moment is
bounded. These two bounds are obtained using a new technique based on a
discretization of the space of possible events for the "in probability"
parameter optimization problem. This technique is both simpler and more general
than previous approaches optimizing over a grid on the parameters' space.
Finally, we extend all previous results to anytime-valid bounds using a simple
technique applicable to any existing bound.
( 2
min )
Neural networks have shown remarkable performance in computer vision, but
their deployment in numerous scientific and technical fields is challenging due
to their black-box nature. Scientists and practitioners need to evaluate the
reliability of a decision, i.e., to know simultaneously if a model relies on
the relevant features and whether these features are robust to image
corruptions. Existing attribution methods aim to provide human-understandable
explanations by highlighting important regions in the image domain, but fail to
fully characterize a decision process's reliability. To bridge this gap, we
introduce the Wavelet sCale Attribution Method (WCAM), a generalization of
attribution from the pixel domain to the space-scale domain using wavelet
transforms. Attribution in the wavelet domain reveals where and on what scales
the model focuses, thus enabling us to assess whether a decision is reliable.
Our code is accessible here:
\url{https://github.com/gabrielkasmi/spectral-attribution}.
( 2
min )
Good data stewardship requires removal of data at the request of the data's
owner. This raises the question if and how a trained machine-learning model,
which implicitly stores information about its training data, should be affected
by such a removal request. Is it possible to "remove" data from a
machine-learning model? We study this problem by defining certified removal: a
very strong theoretical guarantee that a model from which data is removed
cannot be distinguished from a model that never observed the data to begin
with. We develop a certified-removal mechanism for linear classifiers and
empirically study learning settings in which this mechanism is practical.
( 2
min )
In this paper, we propose to develop a new Cram\'er-Rao Bound (CRB) when the
parameter to estimate lies in a manifold and follows a prior distribution. This
derivation leads to a natural inequality between an error criteria based on
geometrical properties and this new bound. This main contribution is
illustrated in the problem of covariance estimation when the data follow a
Gaussian distribution and the prior distribution is an inverse Wishart.
Numerical simulation shows new results where the proposed CRB allows to exhibit
interesting properties of the MAP estimator which are not observed with the
classical Bayesian CRB.
( 2
min )
This paper establishes the nearly optimal rate of approximation for deep
neural networks (DNNs) when applied to Korobov functions, effectively
overcoming the curse of dimensionality. The approximation results presented in
this paper are measured with respect to $L_p$ norms and $H^1$ norms. Our
achieved approximation rate demonstrates a remarkable "super-convergence" rate,
outperforming traditional methods and any continuous function approximator.
These results are non-asymptotic, providing error bounds that consider both the
width and depth of the networks simultaneously.
( 2
min )
Happ and Greven (2018) developed a methodology for principal components
analysis of multivariate functional data for data observed on different
dimensional domains. Their approach relies on an estimation of univariate
functional principal components for each univariate functional feature. In this
paper, we present extensive simulations to investigate choosing the number of
principal components to retain. We show empirically that the conventional
approach of using a percentage of variance explained threshold for each
univariate functional feature may be unreliable when aiming to explain an
overall percentage of variance in the multivariate functional data, and thus we
advise practitioners to be careful when using it.
( 2
min )
Building out a machine learning operations (MLOps) platform in the rapidly evolving landscape of artificial intelligence (AI) and machine learning (ML) for organizations is essential for seamlessly bridging the gap between data science experimentation and deployment while meeting the requirements around model performance, security, and compliance. In order to fulfill regulatory and compliance requirements, the […]
( 17
min )
Generative AI models for coding companions are mostly trained on publicly available source code and natural language text. While the large size of the training corpus enables the models to generate code for commonly used functionality, these models are unaware of code in private repositories and the associated coding styles that are enforced when developing […]
( 11
min )
Wield the blade and embrace the way of the samurai for some thrilling action — Onimusha: Warlords comes to GeForce NOW this week. Members can experience feudal Japan in this hack-and-slash adventure game in the cloud. It’s part of an action-packed GFN Thursday, with 16 more games joining the cloud gaming platform’s library. Forging Destinies Read article >
( 5
min )
Wield the blade and embrace the way of the samurai for some thrilling action — Onimusha: Warlords comes to GeForce NOW this week. Members can experience feudal Japan in this hack-and-slash adventure game in the cloud. It’s part of an action-packed GFN Thursday, with 16 more games joining the cloud gaming platform’s library. Forging Destinies Read article >
( 5
min )
Working together to create open-source and private datasets for AI training.
( 2
min )
It is commonly recognized that the expressiveness of deep neural networks is
contingent upon a range of factors, encompassing their depth, width, and other
relevant considerations. Currently, the practical performance of the majority
of deep neural networks remains uncertain. For ReLU (Rectified Linear Unit)
networks with piecewise linear activations, the number of linear convex regions
serves as a natural metric to gauge the network's expressivity. In this paper,
we count the number of linear convex regions in deep neural networks based on
ReLU. In particular, we prove that for any one-dimensional input, there exists
a minimum threshold for the number of neurons required to express it. We also
empirically observe that for the same network, intricate inputs hinder its
capacity to express linear regions. Furthermore, we unveil the iterative
refinement process of decision boundaries in ReLU networks during training. We
aspire for our research to serve as an inspiration for network optimization
endeavors and aids in the exploration and analysis of the behaviors exhibited
by deep networks.
( 2
min )
Minimum Description Length (MDL) estimators, using two-part codes for
universal coding, are analyzed. For general parametric families under certain
regularity conditions, we introduce a two-part code whose regret is close to
the minimax regret, where regret of a code with respect to a target family M is
the difference between the code length of the code and the ideal code length
achieved by an element in M. This is a generalization of the result for
exponential families by Gr\"unwald. Our code is constructed by using an
augmented structure of M with a bundle of local exponential families for data
description, which is not needed for exponential families. This result gives a
tight upper bound on risk and loss of the MDL estimators based on the theory
introduced by Barron and Cover in 1991. Further, we show that we can apply the
result to mixture families, which are a typical example of non-exponential
families.
( 2
min )
The diffusion model has shown remarkable success in computer vision, but it
remains unclear whether the ODE-based probability flow or the SDE-based
diffusion model is more superior and under what circumstances. Comparing the
two is challenging due to dependencies on data distributions, score training,
and other numerical issues. In this paper, we study the problem mathematically
for two limiting scenarios: the zero diffusion (ODE) case and the large
diffusion case. We first introduce a pulse-shape error to perturb the score
function and analyze error accumulation of sampling quality, followed by a
thorough analysis for generalization to arbitrary error. Our findings indicate
that when the perturbation occurs at the end of the generative process, the ODE
model outperforms the SDE model with a large diffusion coefficient. However,
when the perturbation occurs earlier, the SDE model outperforms the ODE model,
and we demonstrate that the error of sample generation due to such a
pulse-shape perturbation is exponentially suppressed as the diffusion term's
magnitude increases to infinity. Numerical validation of this phenomenon is
provided using Gaussian, Gaussian mixture, and Swiss roll distribution, as well
as realistic datasets like MNIST and CIFAR-10.
( 2
min )
Accurate detection of human presence in indoor environments is important for
various applications, such as energy management and security. In this paper, we
propose a novel system for human presence detection using the channel state
information (CSI) of WiFi signals. Our system named attention-enhanced deep
learning for presence detection (ALPD) employs an attention mechanism to
automatically select informative subcarriers from the CSI data and a
bidirectional long short-term memory (LSTM) network to capture temporal
dependencies in CSI. Additionally, we utilize a static feature to improve the
accuracy of human presence detection in static states. We evaluate the proposed
ALPD system by deploying a pair of WiFi access points (APs) for collecting CSI
dataset, which is further compared with several benchmarks. The results
demonstrate that our ALPD system outperforms the benchmarks in terms of
accuracy, especially in the presence of interference. Moreover, bidirectional
transmission data is beneficial to training improving stability and accuracy,
as well as reducing the costs of data collection for training. Overall, our
proposed ALPD system shows promising results for human presence detection using
WiFi CSI signals.
( 2
min )
We consider two popular approaches to Knowledge Graph Completion (KGC):
textual models that rely on textual entity descriptions, and structure-based
models that exploit the connectivity structure of the Knowledge Graph (KG).
Preliminary experiments show that these approaches have complementary
strengths: structure-based models perform well when the gold answer is easily
reachable from the query head in the KG, while textual models exploit
descriptions to give good performance even when the gold answer is not
reachable. In response, we explore ensembling as a way of combining the best of
both approaches. We propose a novel method for learning query-dependent
ensemble weights by using the distributions of scores assigned by individual
models to all candidate entities. Our ensemble baseline achieves
state-of-the-art results on three standard KGC datasets, with up to 6.8 pt MRR
and 8.3 pt Hits@1 gains over best individual models.
( 2
min )
There is currently a large gap in performance between the statistically
rigorous methods like linear regression or additive splines and the powerful
deep methods using neural networks. Previous works attempting to close this gap
have failed to fully investigate the exponentially growing number of feature
combinations which deep networks consider automatically during training. In
this work, we develop a tractable selection algorithm to efficiently identify
the necessary feature combinations by leveraging techniques in feature
interaction detection. Our proposed Sparse Interaction Additive Networks (SIAN)
construct a bridge from these simple and interpretable models to fully
connected neural networks. SIAN achieves competitive performance against
state-of-the-art methods across multiple large-scale tabular datasets and
consistently finds an optimal tradeoff between the modeling capacity of neural
networks and the generalizability of simpler methods.
( 2
min )
Large Language Models (LLMs) are huge artificial neural networks which
primarily serve to generate text, but also provide a very sophisticated
probabilistic model of language use. Since generating a semantically consistent
text requires a form of effective memory, we investigate the memory properties
of LLMs and find surprising similarities with key characteristics of human
memory. This result strongly suggests that the biological features of human
memory leave an imprint on the way that we structure our textual narratives.
( 2
min )
We study differentially private stochastic convex optimization (DP-SCO) under
user-level privacy, where each user may hold multiple data items. Existing work
for user-level DP-SCO either requires super-polynomial runtime [Ghazi et al.
(2023)] or requires the number of users to grow polynomially with the
dimensionality of the problem with additional strict assumptions [Bassily et
al. (2023)]. We develop new algorithms for user-level DP-SCO that obtain
optimal rates for both convex and strongly convex functions in polynomial time
and require the number of users to grow only logarithmically in the dimension.
Moreover, our algorithms are the first to obtain optimal rates for non-smooth
functions in polynomial time. These algorithms are based on multiple-pass
DP-SGD, combined with a novel private mean estimation procedure for
concentrated data, which applies an outlier removal step before estimating the
mean of the gradients.
( 2
min )
In this paper, we present the results of the NeurIPS-2022 Neural MMO
Challenge, which attracted 500 participants and received over 1,600
submissions. Like the previous IJCAI-2022 Neural MMO Challenge, it involved
agents from 16 populations surviving in procedurally generated worlds by
collecting resources and defeating opponents. This year's competition runs on
the latest v1.6 Neural MMO, which introduces new equipment, combat, trading,
and a better scoring system. These elements combine to pose additional
robustness and generalization challenges not present in previous competitions.
This paper summarizes the design and results of the challenge, explores the
potential of this environment as a benchmark for learning methods, and presents
some practical reinforcement learning training approaches for complex tasks
with sparse rewards. Additionally, we have open-sourced our baselines,
including environment wrappers, benchmarks, and visualization tools for future
research.
( 2
min )
Discriminatively trained, deterministic neural networks are the de facto
choice for classification problems. However, even though they achieve
state-of-the-art results on in-domain test sets, they tend to be overconfident
on out-of-distribution (OOD) data. For instance, ReLU networks -- a popular
class of neural network architectures -- have been shown to almost always yield
high confidence predictions when the test data are far away from the training
set, even when they are trained with OOD data. We overcome this problem by
adding a term to the output of the neural network that corresponds to the logit
of an extra class, that we design to dominate the logits of the original
classes as we move away from the training data.This technique provably prevents
arbitrarily high confidence on far-away test data while maintaining a simple
discriminative point-estimate training. Evaluation on various benchmarks
demonstrates strong performance against competitive baselines on both far-away
and realistic OOD data.
( 2
min )
Federated learning (FL) has shown promising potential in safeguarding data
privacy in healthcare collaborations. While the term "FL" was originally coined
by the engineering community, the statistical field has also explored similar
privacy-preserving algorithms. Statistical FL algorithms, however, remain
considerably less recognized than their engineering counterparts. Our goal was
to bridge the gap by presenting the first comprehensive comparison of FL
frameworks from both engineering and statistical domains. We evaluated five FL
frameworks using both simulated and real-world data. The results indicate that
statistical FL algorithms yield less biased point estimates for model
coefficients and offer convenient confidence interval estimations. In contrast,
engineering-based methods tend to generate more accurate predictions, sometimes
surpassing central pooled and statistical FL models. This study underscores the
relative strengths and weaknesses of both types of methods, emphasizing the
need for increased awareness and their integration in future FL applications.
( 2
min )
In this paper, neural network approximation methods are developed for
elliptic partial differential equations with multi-frequency solutions. Neural
network work approximation methods have advantages over classical approaches in
that they can be applied without much concerns on the form of the differential
equations or the shape or dimension of the problem domain. When applied to
problems with multi-frequency solutions, the performance and accuracy of neural
network approximation methods are strongly affected by the contrast of the
high- and low-frequency parts in the solutions. To address this issue, domain
scaling and residual correction methods are proposed. The efficiency and
accuracy of the proposed methods are demonstrated for multi-frequency model
problems.
( 2
min )
Research in scientific disciplines evolves, often rapidly, over time with the
emergence of novel methodologies and their associated terminologies. While
methodologies themselves being conceptual in nature and rather difficult to
automatically extract and characterise, in this paper, we seek to develop
supervised models for automatic extraction of the names of the various
constituents of a methodology, e.g., `R-CNN', `ELMo' etc. The main research
challenge for this task is effectively modeling the contexts around these
methodology component names in a few-shot or even a zero-shot setting. The main
contributions of this paper towards effectively identifying new evolving
scientific methodology names are as follows: i) we propose a factored approach
to sequence modeling, which leverages a broad-level category information of
methodology domains, e.g., `NLP', `RL' etc.; ii) to demonstrate the feasibility
of our proposed approach of identifying methodology component names under a
practical setting of fast evolving AI literature, we conduct experiments
following a simulated chronological setup (newer methodologies not seen during
the training process); iii) our experiments demonstrate that the factored
approach outperforms state-of-the-art baselines by margins of up to 9.257\% for
the methodology extraction task with the few-shot setup.
( 2
min )
We present a new high-level synthesis methodology for using large language
model tools to generate hardware designs. The methodology uses exclusively
open-source tools excluding the large language model. As a case study, we use
our methodology to generate a permuted congruential random number generator
design with a wishbone interface. We verify the functionality and quality of
the random number generator design using large language model-generated
simulations and the Dieharder randomness test suite. We document all the large
language model chat logs, Python scripts, Verilog scripts, and simulation
results used in the case study. We believe that our method of hardware design
generation coupled with the open source silicon 130 nm design tools will
revolutionize application-specific integrated circuit design. Our methodology
significantly lowers the bar to entry when building domain-specific computing
accelerators for the Internet of Things and proof of concept prototypes for
later fabrication in more modern process nodes.
( 2
min )
Federated learning (FL) is an emerging paradigm for training deep neural
networks (DNNs) in distributed manners. Current FL approaches all suffer from
high communication overhead and information leakage. In this work, we present a
federated learning algorithm based on evolution strategies (FedES), a
zeroth-order training method. Instead of transmitting model parameters, FedES
only communicates loss values, and thus has very low communication overhead.
Moreover, a third party is unable to estimate gradients without knowing the
pre-shared seed, which protects data privacy. Experimental results demonstrate
FedES can achieve the above benefits while keeping convergence performance the
same as that with back propagation methods.
( 2
min )
One of the most promising developments in computer vision in recent years is
the use of generative neural networks for functionality condition-based 3D
design reconstruction and generation. Here, neural networks learn dependencies
between functionalities and a geometry in a very effective way. For a neural
network the functionalities are translated in conditions to a certain geometry.
But the more conditions the design generation needs to reflect, the more
difficult it is to learn clear dependencies. This leads to a multi criteria
design problem due various conditions, which are not considered in the neural
network structure so far.
In this paper, we address this multi-criteria challenge for a 3D design use
case related to an unmanned aerial vehicle (UAV) motor mount. We generate
10,000 abstract 3D designs and subject them all to simulations for three
physical disciplines: mechanics, thermodynamics, and aerodynamics. Then, we
train a Conditional Variational Autoencoder (CVAE) using the geometry and
corresponding multicriteria functional constraints as input. We use our trained
CVAE as well as the Marching cubes algorithm to generate meshes for simulation
based evaluation. The results are then evaluated with the generated UAV
designs. Subsequently, we demonstrate the ability to generate optimized designs
under self-defined functionality conditions using the trained neural network.
( 3
min )
Consistency-based diagnosis is an established approach to diagnose technical
applications, but suffers from significant modeling efforts, especially for
dynamic multi-modal time series. Machine learning seems to be an obvious
solution, which becomes less obvious when looking at details: Which notion of
consistency can be used? If logical calculi are still to be used, how can
dynamic time series be transferred into the discrete world?
This paper presents the methodology Discret2Di for automated learning of
logical expressions for consistency-based diagnosis. While these logical
calculi have advantages by providing a clear notion of consistency, they have
the key problem of relying on a discretization of the dynamic system. The
solution presented combines machine learning from both the time series and the
symbolic domain to automate the learning of logical rules for consistency-based
diagnosis.
( 2
min )
The adoption of diagnosis and prognostic algorithms in healthcare has led to
concerns about the perpetuation of bias against disadvantaged groups of
individuals. Deep learning methods to detect and mitigate bias have revolved
around modifying models, optimization strategies, and threshold calibration
with varying levels of success. Here, we generate a data-centric,
model-agnostic, task-agnostic approach to evaluate dataset bias by
investigating the relationship between how easily different groups are learned
at small sample sizes (AEquity). We then apply a systematic analysis of AEq
values across subpopulations to identify and mitigate manifestations of racial
bias in two known cases in healthcare - Chest X-rays diagnosis with deep
convolutional neural networks and healthcare utilization prediction with
multivariate logistic regression. AEq is a novel and broadly applicable metric
that can be applied to advance equity by diagnosing and remediating bias in
healthcare datasets.
( 2
min )
Visualization tools can help synthetic biologists and molecular programmers
understand the complex reactive pathways of nucleic acid reactions, which can
be designed for many potential applications and can be modelled using a
continuous-time Markov chain (CTMC). Here we present ViDa, a new visualization
approach for DNA reaction trajectories that uses a 2D embedding of the
secondary structure state space underlying the CTMC model. To this end, we
integrate a scattering transform of the secondary structure adjacency, a
variational autoencoder, and a nonlinear dimensionality reduction method. We
augment the training loss with domain-specific supervised terms that capture
both thermodynamic and kinetic features. We assess ViDa on two well-studied DNA
hybridization reactions. Our results demonstrate that the domain-specific
features lead to significant quality improvements over the state-of-the-art in
DNA state space visualization, successfully separating different folding
pathways and thus providing useful insights into dominant reaction mechanisms.
( 2
min )
Try to generate new bridge types using generative artificial intelligence
technology. The grayscale images of the bridge facade with the change of
component width was rendered by 3dsMax animation software, and then the OpenCV
module performed an appropriate amount of geometric transformation (rotation,
horizontal scale, vertical scale) to obtain the image dataset of three-span
beam bridge, arch bridge, cable-stayed bridge and suspension bridge. Based on
Python programming language, TensorFlow and Keras deep learning platform
framework, variational autoencoder was constructed and trained, and
low-dimensional bridge-type latent space that is convenient for vector
operations was obtained. Variational autoencoder can combine two bridge types
on the basis of the original of human into one that is a new bridge type.
Generative artificial intelligence technology can assist bridge designers in
bridge-type innovation, and can be used as copilot.
( 2
min )
We introduce AdaSub, a stochastic optimization algorithm that computes a
search direction based on second-order information in a low-dimensional
subspace that is defined adaptively based on available current and past
information. Compared to first-order methods, second-order methods exhibit
better convergence characteristics, but the need to compute the Hessian matrix
at each iteration results in excessive computational expenses, making them
impractical. To address this issue, our approach enables the management of
computational expenses and algorithm efficiency by enabling the selection of
the subspace dimension for the search. Our code is freely available on GitHub,
and our preliminary numerical results demonstrate that AdaSub surpasses popular
stochastic optimizers in terms of time and number of iterations required to
reach a given accuracy.
( 2
min )
As control engineering methods are applied to increasingly complex systems,
data-driven approaches for system identification appear as a promising
alternative to physics-based modeling. While the Bayesian approaches prevalent
for safety-critical applications usually rely on the availability of state
measurements, the states of a complex system are often not directly measurable.
It may then be necessary to jointly estimate the dynamics and the latent state,
making the quantification of uncertainties and the design of controllers with
formal performance guarantees considerably more challenging. This paper
proposes a novel method for the computation of an optimal input trajectory for
unknown nonlinear systems with latent states based on a combination of particle
Markov chain Monte Carlo methods and scenario theory. Probabilistic performance
guarantees are derived for the resulting input trajectory, and an approach to
validate the performance of arbitrary control laws is presented. The
effectiveness of the proposed method is demonstrated in a numerical simulation.
( 2
min )
The mean shift (MS) algorithm seeks a mode of the kernel density estimate
(KDE). This study presents a convergence guarantee of the mode estimate
sequence generated by the MS algorithm and an evaluation of the convergence
rate, under fairly mild conditions, with the help of the argument concerning
the {\L}ojasiewicz inequality. Our findings extend existing ones covering
analytic kernels and the Epanechnikov kernel. Those are significant in that
they cover the biweight kernel, which is optimal among non-negative kernels in
terms of the asymptotic statistical efficiency for the KDE-based mode
estimation.
( 2
min )
We present an exact Bayesian inference method for discrete statistical
models, which can find exact solutions to a large class of discrete inference
problems, even with infinite support and continuous priors. To express such
models, we introduce a probabilistic programming language that supports
discrete and continuous sampling, discrete observations, affine functions,
(stochastic) branching, and conditioning on discrete events. Our key tool is
probability generating functions: they provide a compact closed-form
representation of distributions that are definable by programs, thus enabling
the exact computation of posterior probabilities, expectation, variance, and
higher moments. Our inference method is provably correct and fully automated in
a tool called Genfer, which uses automatic differentiation (specifically,
Taylor polynomials), but does not require computer algebra. Our experiments
show that Genfer is often faster than the existing exact inference tools PSI,
Dice, and Prodigy. On a range of real-world inference problems that none of
these exact tools can solve, Genfer's performance is competitive with
approximate Monte Carlo methods, while avoiding approximation errors.
( 2
min )
We study the training dynamics of a shallow neural network with quadratic
activation functions and quadratic cost in a teacher-student setup. In line
with previous works on the same neural architecture, the optimization is
performed following the gradient flow on the population risk, where the average
over data points is replaced by the expectation over their distribution,
assumed to be Gaussian.We first derive convergence properties for the gradient
flow and quantify the overparameterization that is necessary to achieve a
strong signal recovery. Then, assuming that the teachers and the students at
initialization form independent orthonormal families, we derive a
high-dimensional limit for the flow and show that the minimal
overparameterization is sufficient for strong recovery. We verify by numerical
experiments that these results hold for more general initializations.
( 2
min )
We study scalable machine learning models for full event reconstruction in
high-energy electron-positron collisions based on a highly granular detector
simulation. Particle-flow reconstruction can be formulated as a supervised
learning task using tracks and calorimeter clusters or hits. We compare a graph
neural network and kernel-based transformer and demonstrate that both avoid
quadratic memory allocation and computational cost while achieving realistic
reconstruction. We show that hyperparameter tuning on a supercomputer
significantly enhances the physics performance of the models, improving the jet
transverse momentum resolution by up to 50% compared to the baseline. The
resulting model is highly portable across hardware processors. Finally, we
demonstrate that the model can be trained on highly granular inputs consisting
of tracks and calorimeter hits, resulting in a competitive physics performance
with the baseline. Datasets and software to reproduce the studies are published
following the findable, accessible, interoperable, and reusable principles.
( 2
min )
The aim of this paper is to make clear and precise the relationship between
the Rubin causal model (RCM) and structural causal model (SCM) frameworks for
causal inference. Adopting a neutral logical perspective, and drawing on
previous work, we show what is required for an RCM to be representable by an
SCM. A key result then shows that every RCM -- including those that violate
algebraic principles implied by the SCM framework -- emerges as an abstraction
of some representable RCM. Finally, we illustrate the power of this
conciliatory perspective by pinpointing an important role for SCM principles in
classic applications of RCMs; conversely, we offer a characterization of the
algebraic constraints implied by a graph, helping to substantiate further
comparisons between the two frameworks.
( 2
min )
There is currently a large gap in performance between the statistically
rigorous methods like linear regression or additive splines and the powerful
deep methods using neural networks. Previous works attempting to close this gap
have failed to fully investigate the exponentially growing number of feature
combinations which deep networks consider automatically during training. In
this work, we develop a tractable selection algorithm to efficiently identify
the necessary feature combinations by leveraging techniques in feature
interaction detection. Our proposed Sparse Interaction Additive Networks (SIAN)
construct a bridge from these simple and interpretable models to fully
connected neural networks. SIAN achieves competitive performance against
state-of-the-art methods across multiple large-scale tabular datasets and
consistently finds an optimal tradeoff between the modeling capacity of neural
networks and the generalizability of simpler methods.
( 2
min )
We study versions of Hilbert's projective metric for spaces of integrable
functions of bounded growth. These metrics originate from cones which are
relaxations of the cone of all non-negative functions, in the sense that they
include all functions having non-negative integral values when multiplied with
certain test functions. We show that kernel integral operators are contractions
with respect to suitable specifications of such metrics even for kernels which
are not bounded away from zero, provided that the decay to zero of the kernel
is controlled. As an application to entropic optimal transport, we show
exponential convergence of Sinkhorn's algorithm in settings where the marginal
distributions have sufficiently light tails compared to the growth of the cost
function.
( 2
min )
In this post, we show you how to create a MAP connector to AWS HealthImaging, which is reusable in applications built with the MONAI Deploy App SDK, to integrate with and accelerate image data retrieval from a cloud-native DICOM store to medical imaging AI workloads. The MONAI Deploy SDK can be used to support hospital operations. We also demonstrate two hosting options to deploy MAP AI applications on SageMaker at scale.
( 10
min )
This post explores how Amazon CodeWhisperer can help with code optimization for sustainability through increased resource efficiency. Computationally resource-efficient coding is one technique that aims to reduce the amount of energy required to process a line of code and, as a result, aid companies in consuming less energy overall. In this era of cloud computing, […]
( 8
min )
NVIDIA’s AI platform raised the bar for AI training and high performance computing in the latest MLPerf industry benchmarks. Among many new records and milestones, one in generative AI stands out: NVIDIA Eos — an AI supercomputer powered by a whopping 10,752 NVIDIA H100 Tensor Core GPUs and NVIDIA Quantum-2 InfiniBand networking — completed a Read article >
( 7
min )
NVIDIA’s AI platform raised the bar for AI training and high performance computing in the latest MLPerf industry benchmarks. Among many new records and milestones, one in generative AI stands out: NVIDIA Eos — an AI supercomputer powered by a whopping 10,752 NVIDIA H100 Tensor Core GPUs and NVIDIA Quantum-2 InfiniBand networking — completed a Read article >
( 7
min )
When patients in Vietnam enter a medical facility in distress, doctors use NVIDIA technology to get more accurate scans to diagnose their ailments. In Hong Kong, a different set of doctors leverage generative AI to discover new cures for patients. Improving the health and well-being of citizens and strengthening economies and communities are key themes Read article >
( 6
min )
When patients in Vietnam enter a medical facility in distress, doctors use NVIDIA technology to get more accurate scans to diagnose their ailments. In Hong Kong, a different set of doctors leverage generative AI to discover new cures for patients. Improving the health and well-being of citizens and strengthening economies and communities are key themes Read article >
( 6
min )
Clinician-led healthcare AI company Harrison.ai has built an AI system that effectively serves as a “spell checker” for radiologists — flagging critical findings to improve the speed and accuracy of radiology image analysis, reducing misdiagnoses. In the latest episode of NVIDIA’s AI Podcast, host Noah Kravitz spoke with Harrison.ai cofounder and CEO Aengus Tran about Read article >
( 6
min )
Clinician-led healthcare AI company Harrison.ai has built an AI system that effectively serves as a “spell checker” for radiologists — flagging critical findings to improve the speed and accuracy of radiology image analysis, reducing misdiagnoses. In the latest episode of NVIDIA’s AI Podcast, host Noah Kravitz spoke with Harrison.ai cofounder and CEO Aengus Tran about Read article >
( 6
min )
Neural Radiance Fields (NeRF) enable 3D scene reconstruction from 2D images
and camera poses for Novel View Synthesis (NVS). Although NeRF can produce
photorealistic results, it often suffers from overfitting to training views,
leading to poor geometry reconstruction, especially in low-texture areas. This
limitation restricts many important applications which require accurate
geometry, such as extrapolated NVS, HD mapping and scene editing. To address
this limitation, we propose a new method to improve NeRF's 3D structure using
only RGB images and semantic maps. Our approach introduces a novel plane
regularization based on Singular Value Decomposition (SVD), that does not rely
on any geometric prior. In addition, we leverage the Structural Similarity
Index Measure (SSIM) in our loss design to properly initialize the volumetric
representation of NeRF. Quantitative and qualitative results show that our
method outperforms popular regularization approaches in accurate geometry
reconstruction for large-scale outdoor scenes and achieves SoTA rendering
quality on the KITTI-360 NVS benchmark.
( 2
min )
A significant challenge facing researchers in the area of multi-agent
reinforcement learning (MARL) pertains to the identification of a library that
can offer fast and compatible development for multi-agent tasks and algorithm
combinations, while obviating the need to consider compatibility issues. In
this paper, we present MARLlib, a library designed to address the
aforementioned challenge by leveraging three key mechanisms: 1) a standardized
multi-agent environment wrapper, 2) an agent-level algorithm implementation,
and 3) a flexible policy mapping strategy. By utilizing these mechanisms,
MARLlib can effectively disentangle the intertwined nature of the multi-agent
task and the learning process of the algorithm, with the ability to
automatically alter the training strategy based on the current task's
attributes. The MARLlib library's source code is publicly accessible on GitHub:
\url{https://github.com/Replicable-MARL/MARLlib}.
( 2
min )
A quantum thermal machine is an open quantum system that enables the
conversion between heat and work at the micro or nano-scale. Optimally
controlling such out-of-equilibrium systems is a crucial yet challenging task
with applications to quantum technologies and devices. We introduce a general
model-free framework based on Reinforcement Learning to identify
out-of-equilibrium thermodynamic cycles that are Pareto optimal trade-offs
between power and efficiency for quantum heat engines and refrigerators. The
method does not require any knowledge of the quantum thermal machine, nor of
the system model, nor of the quantum state. Instead, it only observes the heat
fluxes, so it is both applicable to simulations and experimental devices. We
test our method on a model of an experimentally realistic refrigerator based on
a superconducting qubit, and on a heat engine based on a quantum harmonic
oscillator. In both cases, we identify the Pareto-front representing optimal
power-efficiency tradeoffs, and the corresponding cycles. Such solutions
outperform previous proposals made in the literature, such as optimized Otto
cycles, reducing quantum friction.
( 2
min )
In this paper, we introduce faster first-order primal-dual algorithms for
minimizing a convex function subject to strongly convex function constraints.
Before our work, the best complexity bound was $\mathcal{O}(1/{\varepsilon})$,
and it remains unclear how to improve this result by leveraging the strong
convexity assumption. We address this issue by developing novel techniques to
progressively estimate the strong convexity of the Lagrangian function. Our
approach yields an improved complexity of $\mathcal{O}(1/\sqrt{\varepsilon})$,
matching the complexity lower bound for strongly-convex-concave saddle point
optimization. We show the superior performance of our methods in
sparsity-inducing constrained optimization, notably Google's personalized
PageRank problem. Furthermore, we show that a restarted version of the proposed
methods can effectively identify the sparsity pattern of the optimal solution
within a finite number of steps, a result that appears to have independent
significance.
( 2
min )
Imitation learning of robot policies from few demonstrations is crucial in
open-ended applications. We propose a new method, Interaction Warping, for
learning SE(3) robotic manipulation policies from a single demonstration. We
infer the 3D mesh of each object in the environment using shape warping, a
technique for aligning point clouds across object instances. Then, we represent
manipulation actions as keypoints on objects, which can be warped with the
shape of the object. We show successful one-shot imitation learning on three
simulated and real-world object re-arrangement tasks. We also demonstrate the
ability of our method to predict object meshes and robot grasps in the wild.
( 2
min )
Interatomic potentials learned using machine learning methods have been
successfully applied to atomistic simulations. However, accurate models require
large training datasets, while generating reference calculations is
computationally demanding. To bypass this difficulty, we propose a transfer
learning algorithm that leverages the ability of graph neural networks (GNNs)
to represent chemical environments together with kernel mean embeddings. We
extract a feature map from GNNs pre-trained on the OC20 dataset and use it to
learn the potential energy surface from system-specific datasets of catalytic
processes. Our method is further enhanced by incorporating into the kernel the
chemical species information, resulting in improved performance and
interpretability. We test our approach on a series of realistic datasets of
increasing complexity, showing excellent generalization and transferability
performance, and improving on methods that rely on GNNs or ridge regression
alone, as well as similar fine-tuning approaches.
( 2
min )
Weakly supervised semantic segmentation (WSSS) aims to bypass the need for
laborious pixel-level annotation by using only image-level annotation. Most
existing methods rely on Class Activation Maps (CAM) to derive pixel-level
pseudo-labels and use them to train a fully supervised semantic segmentation
model. Although these pseudo-labels are class-aware, indicating the coarse
regions for particular classes, they are not object-aware and fail to delineate
accurate object boundaries. To address this, we introduce a simple yet
effective method harnessing the Segment Anything Model (SAM), a class-agnostic
foundation model capable of producing fine-grained instance masks of objects,
parts, and subparts. We use CAM pseudo-labels as cues to select and combine SAM
masks, resulting in high-quality pseudo-labels that are both class-aware and
object-aware. Our approach is highly versatile and can be easily integrated
into existing WSSS methods without any modification. Despite its simplicity,
our approach shows consistent gain over the state-of-the-art WSSS methods on
both PASCAL VOC and MS-COCO datasets.
( 2
min )
Convolutional neural networks necessitate good algorithms to reduce
complexity, and sufficient utilization of parallel processors for acceleration.
Within convolutional layers, there are three types of operators: convolution
used in forward propagation, deconvolution and dilated-convolution utilized in
backward propagation. During the execution of these operators, zeros are
typically added to tensors, leading to redundant calculations and unnecessary
strain on hardware. To circumvent these inefficiencies, we propose the C-K-S
algorithm, accompanied by efficient GPU implementations. C-K-S trims filters to
exclude zero-padding. For deconvolution and dilated-convolution, C-K-S
transforms sparse tensors into dense tensors, and standardizes the local
computational rules to simplify the hardware control. The experimental results
demonstrate that C-K-S offers good performance in terms of speed and
convergence, surpassing the capabilities of PyTorch and cuDNN in certain
scenarios.
( 2
min )
This work introduces the first small-loss and gradual-variation regret bounds
for online portfolio selection, marking the first instances of data-dependent
bounds for online convex optimization with non-Lipschitz, non-smooth losses.
The algorithms we propose exhibit sublinear regret rates in the worst cases and
achieve logarithmic regrets when the data is "easy," with per-iteration time
almost linear in the number of investment alternatives. The regret bounds are
derived using novel smoothness characterizations of the logarithmic loss, a
local norm-based analysis of following the regularized leader (FTRL) with
self-concordant regularizers, which are not necessarily barriers, and an
implicit variant of optimistic FTRL with the log-barrier.
( 2
min )
We demonstrate a validity problem of machine learning in the vital
application area of disease diagnosis in medicine. It arises when target labels
in training data are determined by an indirect measurement, and the fundamental
measurements needed to determine this indirect measurement are included in the
input data representation. Machine learning models trained on this data will
learn nothing else but to exactly reconstruct the known target definition. Such
models show perfect performance on similarly constructed test data but will
fail catastrophically on real-world examples where the defining fundamental
measurements are not or only incompletely available. We present a general
procedure allowing identification of problematic datasets and black-box machine
learning models trained on them, and exemplify our detection procedure on the
task of early prediction of sepsis.
( 2
min )
Estimating a prediction function is a fundamental component of many data
analyses. The Super Learner ensemble, a particular implementation of stacking,
has desirable theoretical properties and has been used successfully in many
applications. Dimension reduction can be accomplished by using variable
screening algorithms, including the lasso, within the ensemble prior to fitting
other prediction algorithms. However, the performance of a Super Learner using
the lasso for dimension reduction has not been fully explored in cases where
the lasso is known to perform poorly. We provide empirical results that suggest
that a diverse set of candidate screening algorithms should be used to protect
against poor performance of any one screen, similar to the guidance for
choosing a library of prediction algorithms for the Super Learner.
( 2
min )
Kernel density estimation (KDE) is integral to a range of generative and
discriminative tasks in machine learning. Drawing upon tools from the
multidimensional calculus of variations, we derive an optimal weight function
that reduces bias in standard kernel density estimates for density ratios,
leading to improved estimates of prediction posteriors and
information-theoretic measures. In the process, we shed light on some
fundamental aspects of density estimation, particularly from the perspective of
algorithms that employ KDEs as their main building blocks.
( 2
min )
We propose the Kuramoto Graph Neural Network (KuramotoGNN), a novel class of
continuous-depth graph neural networks (GNNs) that employs the Kuramoto model
to mitigate the over-smoothing phenomenon, in which node features in GNNs
become indistinguishable as the number of layers increases. The Kuramoto model
captures the synchronization behavior of non-linear coupled oscillators. Under
the view of coupled oscillators, we first show the connection between Kuramoto
model and basic GNN and then over-smoothing phenomenon in GNNs can be
interpreted as phase synchronization in Kuramoto model. The KuramotoGNN
replaces this phase synchronization with frequency synchronization to prevent
the node features from converging into each other while allowing the system to
reach a stable synchronized state. We experimentally verify the advantages of
the KuramotoGNN over the baseline GNNs and existing methods in reducing
over-smoothing on various graph deep learning benchmark tasks.
( 2
min )
In biomedical applications it is often necessary to estimate a physiological
response to a treatment consisting of multiple components, and learn the
separate effects of the components in addition to the joint effect. Here, we
extend existing probabilistic nonparametric approaches to explicitly address
this problem. We also develop a new convolution-based model for composite
treatment-response curves that is more biologically interpretable. We validate
our models by estimating the impact of carbohydrate and fat in meals on blood
glucose. By differentiating treatment components, incorporating their dosages,
and sharing statistical information across patients via a hierarchical
multi-output Gaussian process, our method improves prediction accuracy over
existing approaches, and allows us to interpret the different effects of
carbohydrates and fat on the overall glucose response.
( 2
min )
We show that the likelihood function for a multinomial vector observed under
arbitrary interval censoring constraints on the frequencies or their partial
sums is completely log-concave by proving that the constrained sample spaces
comprise M-convex subsets of the discrete simplex.
( 2
min )
This paper studies Anderson acceleration (AA) for fixed-point methods
${x}^{(k+1)}=q({x}^{(k)})$. It provides the first proof that when the operator
$q$ is linear and symmetric, AA improves the root-linear convergence factor
over the fixed-point iterations. When $q$ is nonlinear, yet has a symmetric
Jacobian at the solution, a slightly modified AA algorithm is proved to have an
analogous root-linear convergence factor improvement over fixed-point
iterations. Simulations verify our observations. Furthermore, experiments with
different data models demonstrate AA is significantly superior to the standard
fixed-point methods for Tyler's M-estimation.
( 2
min )
We are seeing a flurry of regulation But we should ask ourselves if we are seeing regulatory capture — ie letting corporations write lax rules that lead to public harm. Andrew Ng points out some contradictions: “It’s also a mistake to set reporting requirements based on a computation threshold for model training. This will stifle… Read More »Regulatory Capture: Why AI regulation favours the incumbents
The post Regulatory Capture: Why AI regulation favours the incumbents appeared first on Data Science Central.
( 20
min )
Large language models (LLMs) with their broad knowledge, can generate human-like text on almost any topic. However, their training on massive datasets also limits their usefulness for specialized tasks. Without continued learning, these models remain oblivious to new data and trends that emerge after their initial training. Furthermore, the cost to train new LLMs can […]
( 14
min )
This research paper was presented at the 64th IEEE Symposium on Foundations of Computer Science (FOCS) 2023 (opens in new tab), a premier forum for the latest research in theoretical computer science. Submodular functions are versatile mathematical tools, finding diverse applications in real-world scenarios and guiding solutions across complex domains. From dissecting the intricate networks […]
The post Toward developing faster algorithms for minimizing submodular functions appeared first on Microsoft Research.
( 10
min )
Taiwanese artist Steven Tung creates captivating 2D and 3D digital art that explores sci-fi, minimalism and realism and pushes artistic boundaries.
( 6
min )
Taiwanese artist Steven Tung creates captivating 2D and 3D digital art that explores sci-fi, minimalism and realism and pushes artistic boundaries.
( 6
min )
The expressivity of Graph Neural Networks (GNNs) can be entirely
characterized by appropriate fragments of the first-order logic. Namely, any
query of the two variable fragment of graded modal logic (GC2) interpreted over
labeled graphs can be expressed using a GNN whose size depends only on the
depth of the query. As pointed out by [Barcelo & Al., 2020, Grohe, 2021], this
description holds for a family of activation functions, leaving the possibility
for a hierarchy of logics expressible by GNNs depending on the chosen
activation function. In this article, we show that such hierarchy indeed exists
by proving that GC2 queries cannot be expressed by GNNs with polynomial
activation functions. This implies a separation between polynomial and popular
non-polynomial activations (such as ReLUs, sigmoid and hyperbolic tan and
others) and answers an open question formulated by [Grohe, 2021].
( 2
min )
Quantifying the difference between two probability density functions, $p$ and
$q$, using available data, is a fundamental problem in Statistics and Machine
Learning. A usual approach for addressing this problem is the likelihood-ratio
estimation (LRE) between $p$ and $q$, which -- to our best knowledge -- has
been investigated mainly for the offline case. This paper contributes by
introducing a new framework for online non-parametric LRE (OLRE) for the
setting where pairs of iid observations $(x_t \sim p, x'_t \sim q)$ are
observed over time. The non-parametric nature of our approach has the advantage
of being agnostic to the forms of $p$ and $q$. Moreover, we capitalize on the
recent advances in Kernel Methods and functional minimization to develop an
estimator that can be efficiently updated online. We provide theoretical
guarantees for the performance of the OLRE method along with empirical
validation in synthetic experiments.
( 2
min )
An emerging new paradigm for solving inverse problems is via the use of deep
learning to learn a regularizer from data. This leads to high-quality results,
but often at the cost of provable guarantees. In this work, we show how
well-posedness and convergent regularization arises within the convex-nonconvex
(CNC) framework for inverse problems. We introduce a novel input weakly convex
neural network (IWCNN) construction to adapt the method of learned adversarial
regularization to the CNC framework. Empirically we show that our method
overcomes numerical issues of previous adversarial methods.
( 2
min )
Optical computing systems can provide high-speed and low-energy data
processing but face deficiencies in computationally demanding training and
simulation-to-reality gap. We propose a model-free solution for lightweight in
situ optimization of optical computing systems based on the score gradient
estimation algorithm. This approach treats the system as a black box and
back-propagates loss directly to the optical weights' probabilistic
distributions, hence circumventing the need for computation-heavy and biased
system simulation. We demonstrate a superior classification accuracy on the
MNIST and FMNIST datasets through experiments on a single-layer diffractive
optical computing system. Furthermore, we show its potential for image-free and
high-speed cell analysis. The inherent simplicity of our proposed method,
combined with its low demand for computational resources, expedites the
transition of optical computing from laboratory demonstrations to real-world
applications.
( 2
min )
Restricting the variance of a policy's return is a popular choice in
risk-averse Reinforcement Learning (RL) due to its clear mathematical
definition and easy interpretability. Traditional methods directly restrict the
total return variance. Recent methods restrict the per-step reward variance as
a proxy. We thoroughly examine the limitations of these variance-based methods,
such as sensitivity to numerical scale and hindering of policy learning, and
propose to use an alternative risk measure, Gini deviation, as a substitute. We
study various properties of this new risk measure and derive a policy gradient
algorithm to minimize it. Empirical evaluation in domains where risk-aversion
can be clearly defined, shows that our algorithm can mitigate the limitations
of variance-based risk measures and achieves high return with low risk in terms
of variance and Gini deviation when others fail to learn a reasonable policy.
( 2
min )
We show how to "compile" human-readable programs into standard decoder-only
transformer models. Our compiler, Tracr, generates models with known structure.
This structure can be used to design experiments. For example, we use it to
study "superposition" in transformers that execute multi-step algorithms.
Additionally, the known structure of Tracr-compiled models can serve as
ground-truth for evaluating interpretability methods. Commonly, because the
"programs" learned by transformers are unknown it is unclear whether an
interpretation succeeded. We demonstrate our approach by implementing and
examining programs including computing token frequencies, sorting, and
parenthesis checking. We provide an open-source implementation of Tracr at
https://github.com/google-deepmind/tracr.
( 2
min )
Quantifying the difference between two probability density functions, $p$ and
$q$, using available data, is a fundamental problem in Statistics and Machine
Learning. A usual approach for addressing this problem is the likelihood-ratio
estimation (LRE) between $p$ and $q$, which -- to our best knowledge -- has
been investigated mainly for the offline case. This paper contributes by
introducing a new framework for online non-parametric LRE (OLRE) for the
setting where pairs of iid observations $(x_t \sim p, x'_t \sim q)$ are
observed over time. The non-parametric nature of our approach has the advantage
of being agnostic to the forms of $p$ and $q$. Moreover, we capitalize on the
recent advances in Kernel Methods and functional minimization to develop an
estimator that can be efficiently updated online. We provide theoretical
guarantees for the performance of the OLRE method along with empirical
validation in synthetic experiments.
( 2
min )
In gradient descent dynamics of neural networks, the top eigenvalue of the
Hessian of the loss (sharpness) displays a variety of robust phenomena
throughout training. This includes early time regimes where the sharpness may
decrease during early periods of training (sharpness reduction), and later time
behavior such as progressive sharpening and edge of stability. We demonstrate
that a simple $2$-layer linear network (UV model) trained on a single training
example exhibits all of the essential sharpness phenomenology observed in
real-world scenarios. By analyzing the structure of dynamical fixed points in
function space and the vector field of function updates, we uncover the
underlying mechanisms behind these sharpness trends. Our analysis reveals (i)
the mechanism behind early sharpness reduction and progressive sharpening, (ii)
the required conditions for edge of stability, and (iii) a period-doubling
route to chaos on the edge of stability manifold as learning rate is increased.
Finally, we demonstrate that various predictions from this simplified model
generalize to real-world scenarios and discuss its limitations.
( 2
min )
A stochastic process that arises by composing a function with a Markov
process is called an aggregated Markov process (AMP). The purpose of composing
a Markov process with a function can be a reduction of dimensions, e.g., a
projection onto certain coordinates. The theory around AMP has been extensively
studied e.g. by Dynkin, Cameron, Rogers and Pitman, and Kelly, all of whom
provided sufficient conditions for an AMP to remain Markov. In another
direction, Larget provided a canonical representation for AMP, which can be
used to verify the equivalence of two AMPs. The purpose of this paper is to
describe how the theory of AMP can be applied to stochastic learning theory as
they learn a particular task.
( 2
min )
Amazon Textract is a machine learning (ML) service that automatically extracts text, handwriting, and data from scanned documents. Queries is a feature that enables you to extract specific pieces of information from varying, complex documents using natural language. Custom Queries provides a way for you to customize the Queries feature for your business-specific, non-standard documents […]
( 9
min )
We are excited to announce that Amazon SageMaker JumpStart can now stream large language model (LLM) inference responses. Token streaming allows you to see the model response output as it is being generated instead of waiting for LLMs to finish the response generation before it is made available for you to use or display. The […]
( 7
min )
GPT-4 Turbo with 128K context and lower prices, the new Assistants API, GPT-4 Turbo with Vision, DALL·E 3 API, and more.
( 7
min )
Sponsored Post Attend the Data Science Symposium 2022 on November 8 The Center for Business Analytics at the University of Cincinnati will present its annual Data Science Symposium 2022 on November 8. This all day in-person event will have three featured speakers and two tech talk tracks with four concurrent presentations in each track. The […]
The post Attend the Data Science Symposium 2022, November 8 in Cincinnati appeared first on Machine Learning Mastery.
( 10
min )